Signals of recent positive selection in a worldwide sample of human populations...maybe

John Hawks & Daniel MacArthur have already pointed to a new paper in Genome Research, Signals of recent positive selection in a worldwide sample of human populations. As Dan notes, it's Open Access, so you can read the PDF yourself. That being said, "Just read it!" might be somewhat a tall order if you don't have some background, so make sure to read the papers at this link. In particular the paper refers to two tests for detecting natural selection which utilize haplotype structure, integrated haplotype score (iHS) and cross population extended haplotype homozygosity (XP-EHH).

One of the interesting points about this paper is that they use the Human Genome Diversity Panel samples. 53 populations, 938 individuals and 660,000 SNPs. Most of the other studies which sliced & diced the data in the same manner as this paper used the HapMap, 30 trios of Africans from Nigeria and Utah whites, as well as 44 individuals from Tokyo and 45 from Beijing. A trio consists of parents + child, so one can see here that the amount of genetic variation is a bit less than you could expect from a headcount (309 unrelated individuals after you remove the children). Though a smaller sample size, the HapMap has a greater SNP density, 3.1 million.

The authors point out that iHS and XP-EHH are complementary. The former has more power to detect selective sweeps which are in the midst of their transition to fixation, on the order of 50%, and far less to detect those where the allele frequency is nearly fixed. In contrast XP-EHH is has more power when the sweep is near fixation, but less at lower frequencies. Neither of the techniques have much power to detect sweeps which might just be in their early stages and so exhibit a lower allele frequency, on the order of ~30% or less.

In addition to this method there is copious utilization of the older technique of FST. While the newer techniques rely on genomic sequences FST could be used with classical genetics (i.e., coarser assays of variation). If you have heard that 85% of genetic variation is within racial groups, and 15% between, that's FST derived logic. In short the value of FST is higher when most of the variance across the total population can be attributed to the variance between groups, while low FST indicate that most of the variance is within the groups. If you get a low FST number then you're not getting much more information by looking at population substructure, but if you have a high FST substructure might be really informative as genetic differences break relatively cleanly along group differences. The HapMap found an FST value of 0.12 across its 4 populations across SNPs, in line with the 15% between race variance alluded to above.

In addition to these techniques which are looking across populations, they make reference to the numerous association studies which have emerged over the past few years. Instead of looking for signatures of evolutionary historical events (e.g., bottleneck, selective sweep, etc.) association studies are attempting to elucidate the regions of the genome responsible for variation on a phenotype. For example, the paper from a few years ago on European pigmentation, which pinpointed a host of SNPs which were disproportionately associated with a given trait value. This did not entail that said SNPs were causally connected to the expression of a particular trait, or any possible selective implication of the trait, but obviously loci which are found to be correlated with variation on a trait may be of further interest. Correlation does not entail causation, but causation is always preconditioned on correlation.

Finally the authors of the paper seem to suggest that it is important to look at possible functional relevance of candidate genes which pop-up on the tests for natural selection, and possible known molecular genetic pathways. As they note many genomic regions which exhibit long haplotypes, which may imply a selective sweep, turn out to harbor no known functional variants. Since these are tests for natural selection one might assume that that implies that the function upon which selection operated is waiting to be discovered. Not so fast:

We saw an important effect of demography in these simulations. The power to detect selection is highest in the ''African'' demography, intermediate in the ''European'' demography, and lowest in the ''East Asian'' demography (Supplemental Fig. 2). Although not explicitly included in the simulations, this suggests that power is low for both these tests in Oceania and America, which have experienced more recent and severe bottlenecks (Conrad et al. 2006). This is consistent with the observation that nonequilibrium demographies can inflate haplotype-based test statistics (Macpherson et al. 2008).

We also investigated the impact of sample size on power. For iHS, the loss of power incurred by decreasing sample size is modest until a threshold of ~40 chromosomes, while XP-EHH maintains power with as few as 20 chromosomes, as long as the reference population is of a fixed sample size (Supplemental Fig. 3). Since many HGDP populations contain around 10 individuals, power may be gained for iHS by grouping together genetically similar populations.

Here's Macpherson et al.:

A beneficial mutation which has nearly but not yet fixed in a population produces a characteristic haplotype configuration, called a partial selective sweep. Whether non-adaptive processes might generate similar haplotype configurations has not been previously explored. Here we consider five population genetic datasets taken from regions flanking high frequency transposable elements in North American strains of D. melanogaster, each of which appears to be consistent with the expectations of a partial selective sweep. We use coalescent simulations to explore whether incorporation of the species' demographic history, purifying selection against the element, or suppression of recombination caused by the element could generate putatively adaptive haplotype configurations. Where most of the datasets would be rejected as non-neutral under the standard neutral null model, only the dataset for which there is strong external evidence in support of an adaptive transposition appears to be non-neutral under the more complex null model, and in particular when demography is taken into account. High frequency, derived mutations from a recently-bottlenecked population, such as we study here, are of great interest to evolutionary genetics in the context of scans for adaptive events; we discuss the broader implications of our findings in this context.

The authors seem to be suggesting that many of the positive signals for selective sweeps yielded by iHS may be false positives insofar as they are due to demographic processes. Jonathan Pritchard, in whose lab several of the paper authors worked in, or work in, alluded to as much last spring in an interview. The implication from their bottleneck model seems that a great deal of the haplotype structure upon which some tests for natural selection rely might be due to drift increasing the frequency of particular variants stochastically.

Let me jump to their conclusion:

We find that putatively selected haplotypes tend to be shared among geographically close populations. In principle, this could be due to issues of statistical power: broad geographical groupings share a demographic history and thus have similar power profiles. However, strongly selected loci are expected to show geographical patterns largely independent of demography--depending on the relevant selection pressures, they can be highly geographically restricted despite moderate levels of migration, or spread rapidly throughout a species even in the presence of little migration (Nagylaki 1975; Morjan and Rieseberg 2004). Further exploration of the geographic patterns in these data and their implications is warranted, but from the point of view of identifying candidate loci for functional verification, the fact that putatively selected loci often conform to the geographic patterns characteristic of neutral loci is somewhat worrying. This suggests that distinguishing true cases of selection from the tails of the neutral distribution may be more difficult than sometimes assumed, and raises the possibility that many loci identified as being under selection in genome scans of this kind may be false positives. Reports of ubiquitous strong (s = 1 5%) positive selection in the human genome (Hawks et al. 2007) may be considerably overstated.

For the last sentence, see John Hawks' reply. Here are some posts on adaptive acceleration from last year. The point about how adaptive alleles should show local adaptation, as opposed to geographically differentiated adaptation, is interesting. In the paper the authors point to several instances where they suggest that local adaptation is compelling (or at least adaptation which can be decoupled from the geographic pattern, and so phylogeny). For instance they review the genetic architecture of skin color:

In general, we find the evidence for selection on disease risk is not as conclusive as that for selection on pigmentation traits. One parsimonious explanation for this is that selection on disease risk, assuming disease risk is under selection at all, is much weaker than selection on pigmentation. However, the role of the genetic architecture of a trait (the number of loci underlying a trait and their effect sizes and frequencies) in how it responds to selection remains largely unexplored. Since the genetic architecture of pigmentation is relatively simple (compared with other complex traits), perhaps a selection signal on this trait is more readily detected because it is spread across fewer loci. On the other hand, this explanation may confuse cause and effect. Perhaps skin pigmentation has a simpler genetic architecture than other complex traits because it has been subject to recent strong selection--the first moves to a new phenotypic optimum are predicted to be on mutations of large fitness effect (Orr 2002). So assuming a positive correlation between the effects of an allele on fitness and on a trait, it is also plausible that the relatively simple genetic architecture of skin pigmentation is actually a consequence of the strong selection that has acted on this phenotype. Further work on the interplay between genetic architecture and natural selection is needed to clarify these issues.

There are some issues with skin pigmentation. The authors observe that since several of the genes elucidated have been studies in admixed European-African populations we have a better grip on the loci which resulted in depigmentation of Europeans than on that of East Asians. Perhaps examining the Malagasy, who are mixed African & Malay, would be informative. In any case the rank order of effect size in skin color is loci are striking, and it seems plausible that we are witnessing some sort of transient. When new selection pressures arise usually you have large effect loci come to the fore, modified over time by smaller effect mutations which mask any deleterious side effects.

Also, I have to mention one point that bothers me about the HGDP sample. Here are the Central & South Asian populations:

Balochi
Brahui
Burusho
Hazara
Kalash
Makrani
Pashtun
Sindhi
Uyghur

The Burusho & Kalash are genetic and cultural isolates. The Uyghur are recently (within the also 2,000 years) emergent as an admixture between populations which were previously not in contact. The Hazara are the same. The Makrani have considerable recent African admixture. The Balochi and Pashtun are Iranian groups, while the Brahui, though Dravidian in origin, have a predominantly Balochi vocabulary and are genetically similar to their neighbors. Of these groups only the Sindhi are Indo-Aryan speaking, and are unequivocally of the South Asian cultural & genetic world. But even then the Sindhi are on the Northwest fringe. From Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India:

i-ab697ab998ce707b05bd65e6f3c55bd5-journal.pgen.0020215.g002.png

I've lopped off most of this chart. I left the Palestinians in as a reference group for West Asia. The "Red" is obviously a signature of South Asians. As you can see the HGDP populations which represent South Asia are atypical. Not that there's anything wrong with it, the goal of the HGDP project from what I recall was to preserve the genetic heritage of rare indigenous peoples who might be absorbed by larger groups in the near future. To my knowledge no one has worried about the possible extinction of the genetic uniqueness of the Bengali people. In any case, the HGDP sample makes South Asians more similar to Europeans & West Asians than they really are. We're talking 100-125 proof brown, not 175.

OK, it's Open Access. Read the whole thing. Play with Haplotter & HGDP. In any case, a map of lactase persistence in Eurasia for the T-13910 allele.

i-fc6fa95ee950b35f10508570758c56df-lactaseI.jpg

Oh, and wait for the supplemental data. A lot is promised in the text.

More like this

Hey Razib,

Your point about the South Asian populations is well-taken; I think newer population samples (eg. HapMap 3) include some that are...um, more Everclear than Bacardi? So it will soon be seen whether these geographic patterns hold up.

Regarding pigmentation, mapping in an admixed Asian-African population would indeed be highly informative for understanding the evolution of skin color; feel free to pass me any contacts you have in Madagascar :)

Joe

By J Pickrell (not verified) on 24 Mar 2009 #permalink

Regarding pigmentation, mapping in an admixed Asian-African population would indeed be highly informative for understanding the evolution of skin color; feel free to pass me any contacts you have in Madagascar :)

well, you could go to france. it has a large expat malagasy community i think.