Last September, Bruce Lahn and colleagues published a couple of papers on the evolution of two genes responsible for brain development in humans (ASPM and Microcephalin). A group led by Sally Otto published a criticism of the analysis performed by Lahn's group in last week's issue of Science (JP has written a good summary on GNXP). Lahn and colleagues issued an excellent response to that criticism.
The original papers on ASPM and Microcephalin argued that the patterns of polymorphism and linkage disequilibrium at the two loci were inconsistent with our current understanding of demographic history in humans. Therefore, natural selection must have acted upon alleles at these genes some point in our recent history. Otto and colleagues claim that the original studies did not consider all possible demographic scenarios, and that there are some that could lead to the observed patterns of polymorphism. These criticisms are weakened by the fact that they do not offer realistic demographic models.
This debate gives me the opportunity to explain how population geneticists analyze DNA sequences to distinguish between demographic history and natural selection. You can find the explanation below the fold.
In a population with constant size, random mating, and no natural selection, the nucleotide sequence polymorphism will follow neutral expectations. If we violate any of these assumptions, we can predict how patterns of nucleotide polymorphism will deviate from the neutral expectations. For example, positive selection and balancing selection result in different patterns of polymorphism. So do population subdivision, bottlenecks, and population expansions. One sticking point, however, is that some selection models and some demographic models lead to similar deviations from neutral expectations.
Given that demography and selection can both lead us to reject neutral patterns of polymorphism, how do we distinguish between these two? By examining a single locus we cannot distinguish between selection and demography, but if we can if we look at multiple loci. Things like population subdivision, population expansion, and bottlenecks affect the entire genome, whereas selection will only affect the loci that are under selection. That's why studies that look at a single locus or mtDNA are limited -- they cannot distinguish between selection and demography. These types of studies are no longer performed in well studied organisms (ie, humans, Drosophila, Arabadopsis, etc). There are sufficient data available for model organisms that we can untangle demography and selection.
Now that we have determined how to distinguish selection and demography, we can detect selection on DNA sequences with more rigor. The first step is to collect all the available sequence data for your species or population of interest and fit a demographic model to those data. I won't get into the details of this process, other than to say it involves coalescent simulations using different demographic parameters. Once the best fitting model is found, that model supplants the neutral model as a null hypothesis for detecting selection. Any loci that reject that model were likely under selection in the recent past.
The criticism levied upon the work by Lahn and colleagues centers around the possibility that demographic effects may be responsible for the non-neutral patterns of polymorphism. In defense of their work, Lahn's group replies that the demographic effects required for those non-neutral patterns are not consistent with those observed at other loci (remember, demography should affect the entire genome uniformly). They claim that the patterns of polymorphism at the two loci not only reject a neutral model, but they also reject a model based on what we know about human population history.