Adam Eyre-Walker has published a review of adaptive evolution in a few well studied systems: Drosophila, humans, viruses, Arabidopsis, etc. These organisms have been the subject of many studies that used DNA polymorphism, DNA divergence, or a combination of the two to detect natural selection in both protein coding and non-coding regions of the genomes. Now that we have whole genome sequences for multiple closely related species from a few different taxa, many researchers are interested in determining the role of natural selection in the evolution of DNA sequences.
Eyre-Walker claims that the evidence for adaptive evolution is greater in Drosophila than in humans. But JP at GNXP thinks that Eyre-Walker doesn’t give the full story of adaptive evolution in the human genome, leaving out important examples. Eyre-Walker relates the difference in adaptive evolution between these two well studied species to differences in population size; humans have a smaller population size, therefore they fix less weakly advantageous mutations.
One way of measuring adaptive evolution is by comparing polymorphism and divergence at synonymous and non-synonymous sites (the McDonald-Kreitman test). Unlike some other tests (ie, Tajima’s D) the MK test is fairly immune to historical changes in population size, but an ancestral increase in population size may lead to an overestimate of advantageous substitutions. Eyre-Walker claims that this is not a concern for studies of a pair of model Drosophila species for two reasons:
“First, if anything, D. melanogaster appears to have gone through a population size decrease. Second, estimates using polymorphism data from either D. simulans or D. melanogaster are very similar; it is difficult to see how the bias could be the same given that the two species have different Ne.” [References omitted.]
Eyre-Walker’s citation for the D. melanogaster ancestral population size is a study that looked at codon bias. The effect of Ne on codon bias will persist much longer than that on polymorphism. It’s more probable that D. melanogaster has been recovering from a small ancestral population size (one that left that signature in codon usage), and has in fact been increasing in population size. It seems to me that the estimate of adaptive evolution in Drosophila is a bit high because of the increased population size in both D. melanogaster and D. simulans.
As mentioned previously, MK tests are robust to many violations of the assumptions that underlie the tests. Eyre-Walker points out that slightly deleterious mutations may lead to biased estimates of advantageous substitutions:
“The exception is the segregation of slightly deleterious non-synonymous mutations, because these can bias the estimate of α [proportion of non-synonymous substitutions that have been fixed by adaptive evolution] either upwards or downwards depending on the demography of the population. If the population size has been relatively stable, the estimate of α is an underestimate, because slightly deleterious mutations tend to contribute relatively more to polymorphism than they do to divergence when compared with neutral mutations. These slightly deleterious mutations can be controlled for by removing low-frequency polymorphisms from the analysis, because such mutations tend to segregate at lower frequencies than do neutral mutations. However, slightly deleterious mutations can lead to an overestimate of α if population sizes have expanded, because mutations that might have been fixed in the past, when the population size was small, no longer segregate as polymorphisms. Even fairly modest increases in population size can create artifactual evidence of adaptive evolution.” [References omitted.]
Slightly deleterious mutations exaggerate the effect of changes in population size. I’m not pointing this out because of how it relates to adaptive evolution in Drosophila. Instead, I find the solution to this problem quite fascinating: remove low-frequency polymorphisms from the data-set. This should remove most slightly deleterious polymorphisms from consideration (assuming constant population size). This immediately led me to think of a particular data set that has this quality built in: Hap-Map.
One major criticism of much of the SNP data in circulation is that it suffers from ascertainment bias (see here for example). Because SNPs are first identified in a small sample and assayed in a larger sample, many rare SNPs are missed. This poses a big problem for tests that depend on the site frequency spectrum of polymorphisms (eg, Tajima’s D), but could actually be useful if slightly deleterious mutations are segregating in the population. This assumes two things: the researcher is using an MK based test and the population size has been constant for many generations. We know that human populations have increased greatly over many generations, so we’re probably still overestimating adaptive evolution if we don’t take mildly deleterious mutations into account.