
Mastering Python for Bioinformatics
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Cover
- Copyright
- Table of Contents
- Preface
- Who Should Read This?
- Programming Style: Why I Avoid OOP and Exceptions
- Structure
- Test-Driven Development
- Using the Command Line and Installing Python
- Getting the Code and Tests
- Installing Modules
- Installing the new.py Program
- Why Did I Write This Book?
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Part I. The Rosalind.info Challenges
- Chapter 1. Tetranucleotide Frequency: Counting Things
- Getting Started
- Creating the Program Using new.py
- Using argparse
- Tools for Finding Errors in the Code
- Introducing Named Tuples
- Adding Types to Named Tuples
- Representing the Arguments with a NamedTuple
- Reading Input from the Command Line or a File
- Testing Your Program
- Running the Program to Test the Output
- Solution 1: Iterating and Counting the Characters in a String
- Counting the Nucleotides
- Writing and Verifying a Solution
- Additional Solutions
- Solution 2: Creating a count() Function and Adding a Unit Test
- Solution 3: Using str.count()
- Solution 4: Using a Dictionary to Count All the Characters
- Solution 5: Counting Only the Desired Bases
- Solution 6: Using collections.defaultdict()
- Solution 7: Using collections.Counter()
- Going Further
- Review
- Chapter 2. Transcribing DNA into mRNA: Mutating Strings, Reading and Writing Files
- Getting Started
- Defining the Program's Parameters
- Defining an Optional Parameter
- Defining One or More Required Positional Parameters
- Using nargs to Define the Number of Arguments
- Using argparse.FileType() to Validate File Arguments
- Defining the Args Class
- Outlining the Program Using Pseudocode
- Iterating the Input Files
- Creating the Output Filenames
- Opening the Output Files
- Writing the Output Sequences
- Printing the Status Report
- Using the Test Suite
- Solutions
- Solution 1: Using str.replace()
- Solution 2: Using re.sub()
- Benchmarking
- Going Further
- Review
- Chapter 3. Reverse Complement of DNA: String Manipulation
- Getting Started
- Iterating Over a Reversed String
- Creating a Decision Tree
- Refactoring
- Solutions
- Solution 1: Using a for Loop and Decision Tree
- Solution 2: Using a Dictionary Lookup
- Solution 3: Using a List Comprehension
- Solution 4: Using str.translate()
- Solution 5: Using Bio.Seq
- Review
- Chapter 4. Creating the Fibonacci Sequence: Writing, Testing, and Benchmarking Algorithms
- Getting Started
- An Imperative Approach
- Solutions
- Solution 1: An Imperative Solution Using a List as a Stack
- Solution 2: Creating a Generator Function
- Solution 3: Using Recursion and Memoization
- Benchmarking the Solutions
- Testing the Good, the Bad, and the Ugly
- Running the Test Suite on All the Solutions
- Going Further
- Review
- Chapter 5. Computing GC Content: Parsing FASTA and Analyzing Sequences
- Getting Started
- Parsing FASTA Using Biopython
- Iterating the Sequences Using a for Loop
- Solutions
- Solution 1: Using a List
- Solution 2: Type Annotations and Unit Tests
- Solution 3: Keeping a Running Max Variable
- Solution 4: Using a List Comprehension with a Guard
- Solution 5: Using the filter() Function
- Solution 6: Using the map() Function and Summing Booleans
- Solution 7: Using Regular Expressions to Find Patterns
- Solution 8: A More Complex find_gc() Function
- Benchmarking
- Going Further
- Review
- Chapter 6. Finding the Hamming Distance: Counting Point Mutations
- Getting Started
- Iterating the Characters of Two Strings
- Solutions
- Solution 1: Iterating and Counting
- Solution 2: Creating a Unit Test
- Solution 3: Using the zip() Function
- Solution 4: Using the zip_longest() Function
- Solution 5: Using a List Comprehension
- Solution 6: Using the filter() Function
- Solution 7: Using the map() Function with zip_longest()
- Solution 8: Using the starmap() and operator.ne() Functions
- Going Further
- Review
- Chapter 7. Translating mRNA into Protein: More Functional Programming
- Getting Started
- K-mers and Codons
- Translating Codons
- Solutions
- Solution 1: Using a for Loop
- Solution 2: Adding Unit Tests
- Solution 3: Another Function and a List Comprehension
- Solution 4: Functional Programming with the map(), partial(), and takewhile() Functions
- Solution 5: Using Bio.Seq.translate()
- Benchmarking
- Going Further
- Review
- Chapter 8. Find a Motif in DNA: Exploring Sequence Similarity
- Getting Started
- Finding Subsequences
- Solutions
- Solution 1: Using the str.find() Method
- Solution 2: Using the str.index() Method
- Solution 3: A Purely Functional Approach
- Solution 4: Using K-mers
- Solution 5: Finding Overlapping Patterns Using Regular Expressions
- Benchmarking
- Going Further
- Review
- Chapter 9. Overlap Graphs: Sequence Assembly Using Shared K-mers
- Getting Started
- Managing Runtime Messages with STDOUT, STDERR, and Logging
- Finding Overlaps
- Grouping Sequences by the Overlap
- Solutions
- Solution 1: Using Set Intersections to Find Overlaps
- Solution 2: Using a Graph to Find All Paths
- Going Further
- Review
- Chapter 10. Finding the Longest Shared Subsequence: Finding K-mers, Writing Functions, and Using Binary Search
- Getting Started
- Finding the Shortest Sequence in a FASTA File
- Extracting K-mers from a Sequence
- Solutions
- Solution 1: Counting Frequencies of K-mers
- Solution 2: Speeding Things Up with a Binary Search
- Going Further
- Review
- Chapter 11. Finding a Protein Motif: Fetching Data and Using Regular Expressions
- Getting Started
- Downloading Sequences Files on the Command Line
- Downloading Sequences Files with Python
- Writing a Regular Expression to Find the Motif
- Solutions
- Solution 1: Using a Regular Expression
- Solution 2: Writing a Manual Solution
- Going Further
- Review
- Chapter 12. Inferring mRNA from Protein: Products and Reductions of Lists
- Getting Started
- Creating the Product of Lists
- Avoiding Overflow with Modular Multiplication
- Solutions
- Solution 1: Using a Dictionary for the RNA Codon Table
- Solution 2: Turn the Beat Around
- Solution 3: Encoding the Minimal Information
- Going Further
- Review
- Chapter 13. Location Restriction Sites: Using, Testing, and Sharing Code
- Getting Started
- Finding All Subsequences Using K-mers
- Finding All Reverse Complements
- Putting It All Together
- Solutions
- Solution 1: Using the zip() and enumerate() Functions
- Solution 2: Using the operator.eq() Function
- Solution 3: Writing a revp() Function
- Testing the Program
- Going Further
- Review
- Chapter 14. Finding Open Reading Frames
- Getting Started
- Translating Proteins Inside Each Frame
- Finding the ORFs in a Protein Sequence
- Solutions
- Solution 1: Using the str.index() Function
- Solution 2: Using the str.partition() Function
- Solution 3: Using a Regular Expression
- Going Further
- Review
- Part II. Other Programs
- Chapter 15. Seqmagique: Creating and Formatting Reports
- Using Seqmagick to Analyze Sequence Files
- Checking Files Using MD5 Hashes
- Getting Started
- Formatting Text Tables Using tabulate()
- Solutions
- Solution 1: Formatting with tabulate()
- Solution 2: Formatting with rich
- Going Further
- Review
- Chapter 16. FASTX grep: Creating a Utility Program to Select Sequences
- Finding Lines in a File Using grep
- The Structure of a FASTQ Record
- Getting Started
- Guessing the File Format
- Solution
- Guessing the File Format from the File Extension
- I Love It When a Plan Comes Together
- Combining Regular Expression Search Flags
- Reducing Boolean Values
- Going Further
- Review
- Chapter 17. DNA Synthesizer: Creating Synthetic Data with Markov Chains
- Understanding Markov Chains
- Getting Started
- Understanding Random Seeds
- Reading the Training Files
- Generating the Sequences
- Structuring the Program
- Solution
- Going Further
- Review
- Chapter 18. FASTX Sampler: Randomly Subsampling Sequence Files
- Getting Started
- Reviewing the Program Parameters
- Defining the Parameters
- Nondeterministic Sampling
- Structuring the Program
- Solutions
- Solution 1: Reading Regular Files
- Solution 2: Reading a Large Number of Compressed Files
- Going Further
- Review
- Chapter 19. Blastomatic: Parsing Delimited Text Files
- Introduction to BLAST
- Using csvkit and csvchk
- Getting Started
- Defining the Arguments
- Parsing Delimited Text Files Using the csv Module
- Parsing Delimited Text Files Using the pandas Module
- Solutions
- Solution 1: Manually Joining the Tables Using Dictionaries
- Solution 2: Writing the Output File with csv.DictWriter()
- Solution 3: Reading and Writing Files Using pandas
- Solution 4: Joining Files Using pandas
- Going Further
- Review
- Appendix A. Documenting Commands and Creating Workflows with make
- Makefiles Are Recipes
- Running a Specific Target
- Running with No Target
- Makefiles Create DAGs
- Using make to Compile a C Program
- Using make for a Shortcut
- Defining Variables
- Writing a Workflow
- Other Workflow Managers
- Further Reading
- Appendix B. Understanding $PATH and Installing Command-Line Programs
- Epilogue
- Index
- About the Author
- Colophon
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.