T. Hofer, N. Ray, D. Wegmann, L. Excoffier (2009). Large allele frequency differences between human continental groups are more likely to have occurred by drift during range expansions than by selection Annals of Human Genetics, 73 (1), 95-108 DOI: 10.1111/j.1469-1809.2008.00489.x
I've just been reading over an article from late last year in the Annals of Human Genetics:
In this study, we examined 772 STRs, 210 diallelic indels, and 2834 SNPs typed in 53 human populations worldwide under the HGDP-CEPH Diversity Panel to determine to which extent allele frequency differs among four regions (Africa, Eurasia, East Asia, and America). We find that large allele frequency differences between continents are surprisingly common, and that Africa and America show the largest number of loci with extreme frequency differences. Moreover, more STR alleles have increased rather than decreased in frequency outside Africa, as expected under allelic surfing. Finally, there is no relationship between the extent of allele frequency differences and proximity to genes, as would be expected under selection. We therefore conclude that most of the observed large allele frequency differences between continents result from demography rather than from positive selection.
The article lists a series of previous studies that have used allele frequency differences between populations as evidence for recent natural selection. The authors argue that the default explanation for such differences should be a phenomenon known as "allelic surfing", a process by which neutral genetic variants can reach high frequency by riding a wave of population expansion into a new geographical region.
Recent human history has been characterised by a series of strong population bottlenecks and range expansions, creating conditions that are perfect for allelic surfing to occur. But it gets even more worrying: because (like selection) allelic surfing results in a rapid and geographically restricted increase in frequency of a genetic variant it can create virtually all of the classic genetic signatures of local positive selection - not only allele frequency differences, but also extended linkage disequilibrium and reduced local genetic diversity. That means that it may be difficult (or even impossible) to distinguish between these two processes with statistical genetic data alone.
As such, the authors argue that many of the reported signals of positive selection in the human genome - based on these genetic signatures - may actually be spurious products of allelic surfing:
Previous studies aiming at detecting positively selected loci have attempted to control for past demography, either by 1) explicitly modelling some complex demography (Sabeti et al. 2007, Stajich & Hahn, 2005, Tang et al. 2007, Williamson et al. 2007), 2) by comparing diversity linked to derived or ancestral alleles (Voight et al. 2006), or 3) by contrasting coding to non-coding regions (Akey et al. 2002, Barreiro et al. 2008). To our knowledge, range expansions have never been used as a null model against which observed patterns were examined, and it is thus unclear (and would be worth examining) how the sensitivity of the first types of approaches would change under such a new null model.
And their conclusions:
While we find that positive selection is unlikely to have shaped the allele frequency spectrum at most loci, it may certainly have acted on fewer genes than previously believed, and our current results do not allow us to discriminate between the effects of demography and selection for an individual locus. Loci which are candidates for being under positive selection should therefore be more carefully scrutinized to find links between potentially selected alleles and a phenotypic effect (see e.g. Sabeti et al. 2007).
It will be interesting to see if these findings hold up in larger analyses in the same populations (e.g. upcoming papers from the same group that created the HGDP Selection Browser I posted about late last year). At the very least, it would be great to see some more detailed theoretical papers exploring the implications of this process for the inference of selection under realistic models of human range expansion.
Dienekes posted on this paper when it was released online, and makes an important point:
I would say that, from now on, the "gold standard" of positive selection should be concrete evidence that the proposed selected alleles actually do something that could have been selected, e.g., lactase persistence, where allele frequency differences are combined with a specific trait, which in turn is correlated with a particular selective influence (milk consumption after weaning). Statistical inference of selection without a comprehensive explanation is no longer intellectually convincing.
That's becoming more and more the case anyway - it's getting damn hard to publish a report of positive selection in humans without at least some basic functional data to back it up, at least in reputable journals. Nonetheless, this paper creates an extra incentive for reviewers (and everyone else) to exercise their skepticism muscles just that little bit more when reading reports on recent selection in humans.
I should emphasise that there's little doubt that at least some recent population-specific selection has occurred in humans (the signal around the lactase gene in Europeans is about as unambiguous as it gets) - but perhaps it has not been anywhere near as pervasive as some researchers (e.g. John Hawks) have argued.
- Log in to post comments
In response to this quote: "it's getting damn hard to publish a report of positive selection in humans without at least some basic functional data to back it up, at least in reputable journals." Well, with the inclusion of such vulgarities such as "damn" within the body of your text, I suppose your chances are not complicated they are now nil. Look how it voided your entire writing. The only thing I remember now is the fact that you inserted a vulgarity into scientific writing.
JPG,
That word is not considered particularly vulgar in most scientific circles, to the point where it could be used in casual conversation (or even in a conference presentation) without raising eyebrows. I suspect this is a cultural misunderstanding. :-)
"it voided": golly, what a vile image; that is rude.
JPG wrote
LOL! If that's all you remember then you're in worse shape than most. Memory depends mostly on semantic processing, and what you've said is that you processed none of the actual post except the "damn." Hence you wouldn't have remembered anything anyway, and merely found an excuse to hang that on.
Interesting. This is going on top of the pile for re-reading, because I've had a hankering to look for signals of selection in some non-human species. One of them has undergone recent population expansion (explosion, really, right down to the star-type phylogeny), whereas the other has remained rather stable through time. It should seem in the former case, I should be without hope, as any signal of selection may be eroded by this phenomena. However, in the latter case, a statistical inference wouldn't be without its merits just yet. Would you agree?
Hi KC,
Yes, (as I understand it, at least) it will always be much harder to find signals of selection in a population that has undergone recent expansion, particularly if it follows a strong bottleneck. This type of demography effectively makes the whole genome look like it's been recently positively selected, so any real signals will be much harder to pick out.
As for whether you'll have luck finding selection in your other species - it all depends what type of data you'll be looking at. Do you have specific genes in mind, or are you thinking of a genome-wide approach? If the latter, has your species been well-characterised in terms of usefully variable genetic markers?
Unfortunately, we don't have exactly have a plethora of functional genes in the second species, though we have enough data from cattle to make try and make a leap if I did this and got backed into a corner. However, what I'd found more interesting to entertain was the idea of a genome wide approach. We do possess sufficient, and sufficiently variable markers to make an attempt - though I hadn't sat down to give that any more than a cursory check, as the whole thing is very pie in the sky still.
The genome-wide approach is always the best option, but it would make for a massive project (and the potential sources of bias and error are legion, of course). If you do end up heading down that road, let me know - I'd be very interested to hear about your approach.
The image I will remember mostly are those surfing alleles :)
Thanks for a nice post.