The Coalescent

One of the most important developments in evolutionary biology in the past few decades has come without much fanfare outside of a small circle of population geneticists. The early models of population genetics were limited when it came to analyzing the nucleotide sequence polymorphism data that began to appear in the 1980s. New statistical techniques were developed to analyze this data, and they all fell under the umbrella of coalescent theory. If you want to understand the evolution of populations, you're missing a lot if you do not understand the coalescent.

When I wrote about the best biology experiments/discoveries I mentioned in the comments that I should have included the coalescent in my list. One reason I have for putting such importance in the coalescent is my bias toward molecular population genetics. But I also could not imagine a huge project like the HapMap being undertaken without coalescent theory. As DNA sequences become more and more common in studies of natural populations (supplanting microsatellites which replaced allozymes), the importance of coalescent theory grows and grows.

Rasmus Nielsen has written a review of a new book on coalescent theory. In his review, Nielsen describes the importance of the coalescent for researchers interested in using molecular markers to study features of natural populations, such as structure and phylogeography. Nielsen claims that this is the first comprehensive treatment of the coalescent published since Richard Hudson's review from 1990 (available here as a pdf), but John Wakeley's book is also available. I have not read either, so I'll refrain from judgment.

I have reproduced a couple of passages from Nielsen's review -- describing the importance of coalescent theory in population genetics -- below the fold.

On the importance of coalescent theory:

Coalescent theory provides a bridge between population genetic models and molecular data. It describes how demography, recombination, and other factors affect the shape of gene trees and provides tools for making statistical inferences from molecular population genetic data. Coalescent theory is necessary in phylogenetics to understand why gene trees may differ from species trees, in conservation biology to understand the relationship between effective population size and census population size, and in molecular ecology to understand almost anything at all. Acquiring a basic knowledge of coalescent theory should be a great help to any evolutionary biologist, and it is a must for researchers and students in population genetics or molecular ecology.

On drawing conclusions from molecular data without applying the coalescent:

There are still too many papers published in the field of evolutionary biology in which an estimated gene tree (or gene network) is used to invent a detailed biological story with little appreciation of the complexity involved in making inferences on demography/population history from gene trees. For example, it is a common misconception that the superposition of an estimated gene tree on a geographical map provides information about the geographic ancestry of the individuals in the sample. Likewise, the use of geographical location as a cladistic character with an ancestral state that can be inferred using ancestral character reconstruction may easily lead to false inferences. The main problem is not only the uncertainty associated with the estimation of trees, but that the tree itself has a strong stochastic component. One of the important insights we have gained from coalescent theory is that the same population history may generate very different gene trees if repeated and that very different historic scenarios may sometimes generate gene trees that are surprisingly similar.

[Emphases added.]

Nielsen believes that there is an under appreciation of the role stochastic processes can play in shaping sequence polymorphism. Without a null model based on the coalescent, there is no way to statistically test hypotheses that are based on DNA data, regarding things like population structure.

More like this

I'm currently working my way through John Wakeley's book on Coalescent Theory. (The website has a few pre-publication chapters if you want to take a peek.) In his introductory chapter, Wakeley introduces the concept of gene genealogies. He's careful to point out that, unlike the phylogenies we…
Mitochondrial DNA (mtDNA) is one of the most used markers in molecular ecology1. A good molecular marker for population level studies should be neutral, so that researchers can use it to infer things like: Population size and changes in population size (expansions and bottlenecks); Population…
People like dogs. They're man's best friend, after all. There are tons of different breeds, many of which could be classified as unique species if we didn't know better. Our interest in dogs has led to lots of studies into dog breeding, figuring out which genes gave rise to the different…
Check out this nice primer of population genetics by Anya Plutynski and Warren Ewens from the Philosophy of Science Encyclopedia. A lot of it deals with classical population genetics (Wright, Fisher, et al), and I especially like their description of Hardy-Weinberg equilibrium: Weinberg and Hardy…

If Coalescent theory allows us to explain past data and make verifiable predictions of what we may find, then it certainly is to be welcomed.

By Michael Bacon (not verified) on 13 Feb 2006 #permalink

Coalescent theory is one of my favorite topics in population genetics. I used to think it was an esoteric topic in population genetics, but as you say in your post with projects like the HapMap, its importance is ever increasing.

Not to be picky,its Nielsen I believe.

Amit, Yeah, I can't spell names. I just went back and changed that (and I fixed Niles Eldredge's name in another post).