About the Author James Tisdall has worked as a musician, as a programmer and member of technical staff at Bell Labs (where he programmed for speech research and discovered a formal language for musical rhythm), as a programmer and systems manager at the Human Genome Project in the Computational Biology and Informatics Laboratory (where he began using Perl for bioinformatics in 1991 with his program DNA WorkBench), as computational biologist at Mercator Genetics in Menlo Park, California (where his Perl programs helped discover the gene involved in the common hereditary disease hemochromatosis), as manager of Bioinformatics at the Fox Chase Cancer Center in Philadelphia, and most recently as a consultant for Biocomputing Associates of Kimberton, Pennsylvania, and the Burke Research Institute affiliated with Cornell University, working on neurodegenerative diseases such as Alzheimer's and Parkinson's.
Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O'Reilly & Associates books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safari.oreilly.com). For more information contact our corporate/institutional sales department: 800-998-9938 or firstname.lastname@example.org.
Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of a tadpole and the the topic of Perl for bioinformatics is a trademark of O'Reilly & Associates, Inc.
While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
Beginning Perl for Bioinformatics
Preface What Is Bioinformatics? About This Book Who This Book Is For Why Should I Learn to Program? Structure of This Book Conventions Used in This Book Comments and Questions Acknowledgments
1. Biology and Computer Science 1.1 The Organization of DNA 1.2 The Organization of Proteins 1.3 In Silico 1.4 Limits to Computation
2. Getting Started with Perl 2.1 A Low and Long Learning Curve 2.2 Perl's Benefits 2.3 Installing Perl on Your Computer 2.4 How to Run Perl Programs 2.5 Text Editors 2.6 Finding Help
3. The Art of Programming 3.1 Individual Approaches to Programming 3.2 Edit—Run—Revise (and Save) 3.3 An Environment of Programs3.4 Programming Strategies
3.5 The Programming Process
4. Sequences and Strings 4.1 Representing Sequence Data 4.2 A Program to Store a DNA Sequence 4.3 Concatenating DNA Fragments 4.4 Transcription: DNA to RNA 4.5 Using the Perl Documentation 4.6 Calculating the Reverse Complement in Perl 4.7 Proteins, Files, and Arrays 4.8 Reading Proteins in Files 4.9 Arrays 4.10 Scalar and List Context 4.11 Exercises
5. Motifs and Loops 5.1 Flow Control 5.2 Code Layout 5.3 Finding Motifs 5.4 Counting Nucleotides 5.5 Exploding Strings into Arrays 5.6 Operating on Strings 5.7 Writing to Files 5.8 Exercises
6. Subroutines and Bugs 6.1 Subroutines 6.2 Scoping and Subroutines 6.3 Command-Line Arguments and Arrays 6.4 Passing Data to Subroutines
7. Mutations and Randomization 7.1 Random Number Generators 7.2 A Program Using Randomization 7.3 A Program to Simulate DNA Mutation 7.4 Generating Random DNA 7.5 Analyzing DNA 7.6 Exercises
8. The Genetic Code 8.1 Hashes 8.2 Data Structures and Algorithms for Biology 8.3 The Genetic Code 8.4 Translating DNA into Proteins 8.5 Reading DNA from Files in FASTA Format 8.6 Reading Frames 8.7 Exercises
9. Restriction Maps and Regular Expressions 9.1 Regular Expressions 9.2 Restriction Maps and Restriction Enzymes 9.3 Perl Operations 9.4 Exercises
13. Further Topics 13.1 The Art of Program Design 13.2 Web Programming 13.3 Algorithms and Sequence Alignment 13.4 Object-Oriented Programming 13.5 Perl Modules 13.6 Complex Data Structures 13.7 Relational Databases 13.8 Microarrays and XML 13.9 Graphics Programming 13.10 Modeling Networks 13.11 DNA Computers
A. Resources A.1 Perl
A.2 Computer Science
A.3 Linux A.4 Bioinformatics A.5 Molecular Biology
B. Perl Summary B.1 Command Interpretation B.2 Comments B.3 Scalar Values and Scalar Variables B.4 Assignment B.5 Statements and Blocks B.6 Arrays B.7 Hashes B.8 Operators B.9 Operator Precedence B.10 Basic Operators B.11 Conditionals and Logical Operators B.12 Binding Operators B.13 Loops B.14 Input/Output B.15 Regular Expressions B.16 Scalar and List Context B.17 Subroutines and Modules B.18 Built-in Functions