T. Hofer, N. Ray, D. Wegmann, L. Excoffier (2009). Large allele frequency differences between human continental groups are more likely to have occurred by drift during range expansions than by selection Annals of Human Genetics, 73 (1), 95-108 DOI: 10.1111/j.1469-1809.2008.00489.x
I’ve just been reading over an article from late last year in the Annals of Human Genetics:
In this study, we examined 772 STRs, 210 diallelic indels, and 2834 SNPs typed in 53 human populations worldwide under the HGDP-CEPH Diversity Panel to determine to which extent allele frequency differs among four regions (Africa, Eurasia, East Asia, and America). We find that large allele frequency differences between continents are surprisingly common, and that Africa and America show the largest number of loci with extreme frequency differences. Moreover, more STR alleles have increased rather than decreased in frequency outside Africa, as expected under allelic surfing. Finally, there is no relationship between the extent of allele frequency differences and proximity to genes, as would be expected under selection. We therefore conclude that most of the observed large allele frequency differences between continents result from demography rather than from positive selection.
The article lists a series of previous studies that have used allele frequency differences between populations as evidence for recent natural selection. The authors argue that the default explanation for such differences should be a phenomenon known as “allelic surfing”, a process by which neutral genetic variants can reach high frequency by riding a wave of population expansion into a new geographical region.
Recent human history has been characterised by a series of strong population bottlenecks and range expansions, creating conditions that are perfect for allelic surfing to occur. But it gets even more worrying: because (like selection) allelic surfing results in a rapid and geographically restricted increase in frequency of a genetic variant it can create virtually all of the classic genetic signatures of local positive selection – not only allele frequency differences, but also extended linkage disequilibrium and reduced local genetic diversity. That means that it may be difficult (or even impossible) to distinguish between these two processes with statistical genetic data alone.
As such, the authors argue that many of the reported signals of positive selection in the human genome – based on these genetic signatures – may actually be spurious products of allelic surfing:
Previous studies aiming at detecting positively selected loci have attempted to control for past demography, either by 1) explicitly modelling some complex demography (Sabeti et al. 2007, Stajich & Hahn, 2005, Tang et al. 2007, Williamson et al. 2007), 2) by comparing diversity linked to derived or ancestral alleles (Voight et al. 2006), or 3) by contrasting coding to non-coding regions (Akey et al. 2002, Barreiro et al. 2008). To our knowledge, range expansions have never been used as a null model against which observed patterns were examined, and it is thus unclear (and would be worth examining) how the sensitivity of the first types of approaches would change under such a new null model.
And their conclusions:
While we find that positive selection is unlikely to have shaped the allele frequency spectrum at most loci, it may certainly have acted on fewer genes than previously believed, and our current results do not allow us to discriminate between the effects of demography and selection for an individual locus. Loci which are candidates for being under positive selection should therefore be more carefully scrutinized to find links between potentially selected alleles and a phenotypic effect (see e.g. Sabeti et al. 2007).
It will be interesting to see if these findings hold up in larger analyses in the same populations (e.g. upcoming papers from the same group that created the HGDP Selection Browser I posted about late last year). At the very least, it would be great to see some more detailed theoretical papers exploring the implications of this process for the inference of selection under realistic models of human range expansion.
Dienekes posted on this paper when it was released online, and makes an important point:
I would say that, from now on, the “gold standard” of positive selection should be concrete evidence that the proposed selected alleles actually do something that could have been selected, e.g., lactase persistence, where allele frequency differences are combined with a specific trait, which in turn is correlated with a particular selective influence (milk consumption after weaning). Statistical inference of selection without a comprehensive explanation is no longer intellectually convincing.
That’s becoming more and more the case anyway – it’s getting damn hard to publish a report of positive selection in humans without at least some basic functional data to back it up, at least in reputable journals. Nonetheless, this paper creates an extra incentive for reviewers (and everyone else) to exercise their skepticism muscles just that little bit more when reading reports on recent selection in humans.
I should emphasise that there’s little doubt that at least some recent population-specific selection has occurred in humans (the signal around the lactase gene in Europeans is about as unambiguous as it gets) – but perhaps it has not been anywhere near as pervasive as some researchers (e.g. John Hawks) have argued.