book review

oreilly book page

"Bioinformatics" is the new sexy term for what used to be called simply "computational biology". Simply put, it involves pretty much any application of computation techniques to biological problems. The reason for the new nomenclature and the greatly increased interest in the topic is, like much in modern biology, a more-or-less direct consequence of the many genome sequencing projects of the last decade.

The consensus in the field seems to be that it's more productive (and certainly easier) to teach biologists how to program, rather than try to get programmers up to speed on the intracities of molecular biology. For similar reasons, Perl is a popular language to learn: it's easy to get off the ground and be productive with it, without requiring a heavy computer science background. (This, of course, has downsides as well...)

Never one to miss out on a trend, I'm going to be teaching a course on Bioperl and advanced Perl programming, starting next fall, which means I'm doing a lot of reading in this topic area, trying to develop lectures and find good background reading material. One of the first books I grabbed was _Beginning Perl for Bioinformatics_, which has been sitting on my "to read" shelf since O'Reilly sent me a review copy in December of 2001. It's a typical O'Reilly "animal" book (the cover bears three tadpoles), which does a decent job of introducing the basic features of the Perl language, and it should enable a dedicated student to get to the point where she can produce small useful programs. However, I'm not completely happy about the book's organization, and I think the occasional "if you're not a biologist, here's some background" interjections could have been cut without hurting anything.

The initial chapters in the book cover "meta" information, such as theoretical limits to computation, installing (or finding) the Perl interpreter on your computer, picking a text editor, and locating on-line documentation. Some general programming theory stuff is covered as well -- the code-run-debug cycle, top-down versus bottom-up design, the use of pseudocode. There's also some biology background, but it's very introductory level stuff -- DNA has four bases, proteins are made of 20 amino acids, and so on.

In chapter four, the book begins to get into actual Perl, with some coverage of string manipulation. Examples deal with simulating the transcription of DNA into RNA. Chapters five and six continue to flesh out the language, covering loops, basic file I/O, and subroutines. Chapter seven introduces the rand() function, in the context of simulating mutations in DNA. Subsequent chapters introduce the hash data type (using a RNA->protein translation simulation), regular expressions (as a way to store the recognition patterns of restriction endonucleases), and parsing database flat files and BLAST program output.

I'm clearly out of the target audience of the book, as I already have a strong working knowledge of Perl. Perhaps that's why I found the order that concepts were presented in to be a bit strange -- for example, hashes, which are a fundamental data type, aren't introduced until halfway through the book, and regular expressions (one of the key features of Perl) first appear even later. As I said above, I also found the biological background sections to be more distracting than anything, but I've also got a strong biology background, so perhaps I'm off base here too. That said, I think a person with a CS background would be better served with a copy of _Learning Perl_ and an introductory molecular biology text than with this particular book.

One of the things I did enjoy about the book were the frequent coding examples, all of which presented realistic computational biology sorts of problems and then demonstrated how to solve them. I'm sure that when I get around to writing lectures, I'll be leafing through this book looking for problems I can use in class.

Overall, recommended for biologists without programming experience who would like to get started using Perl for simple programming. Not recommended for people with computer science backgrounds looking to get into bioinformatics.