evolgen

Dan Hartl just finished a two day whirlwind speaking tour at my university (three talks in under 24 hours). He discussed detecting weak selection in protein coding sequences, identifying the underlying genetic causes of phenotypic variation in yeast, and the genetics of malaria parasites. I won’t get into the details of these talks, but I will point out one thing Hartl brought up in his first talk that goes well with our recent discussion of computational and wet lab biologists.

The topic is revolutionary developments, the field is population genetics, and the time frame is the past twenty years (or so). Hartl pointed out three revolutions — one that involves theory, one technology, and one computation. The revolution in theory is the coalescent, which allows us to simulate population history in reverse (from present to past). This saves us a lot of computing time (as opposed to forward simulations), and has totally changed the way we analyze population genetic data (DNA sequence polymorphism).

The data required for coalescent analyses came about thanks to the technological development: automated DNA sequencing. The Sanger method of DNA sequencing and the automation of scoring bases have greatly increased our ability to generate data. Having a sophisticated theory is all well and good, but it’s meaningless without data. Much of the early years of population genetics were spent developing the theoretical aspects in the absence of significant data.

Whereas population genetics began as a field composed of all theory and no data, we are now in a data binge. Some folks have even complained that there is too much data in biology and not enough theory (uncited, but I know I’ve read it somewhere). Or that we cannot hope to analyze all the available data. That’s where the computational revolution comes in. Not only have our computers become faster, better at processing data, and capable of storing more information, we also implement smarter analytical approaches when analyzing our data. Maximum likelihood and Bayesian statistics take advantage of our computing power to explore vast amounts of possibilities — honing in on those that best fit our data.

These advances come from both wet lab and computational biologists. The wet labs have been able to generate lots of data thanks to the technological innovations (not only DNA sequencing, but also microarrays and other techniques). The computational groups now have tons of data to analyze. And the analytical techniques were developed thanks to the theoreticians. But it doesn’t end there, as the findings from the computational groups represent hypotheses which can be tested by the wet labs. Etc., etc.