It is well known that average levels of population structure are higher on the X chromosome compared to autosomes in humans. However, there have been surprisingly few analyses on the spatial distribution of population structure along the X chromosome. With publicly available data from the HapMap Project and Perlegen Sciences, we show a strikingly punctuated pattern of X chromosome population structure. Specifically, 87% of X-linked HapMap SNPs within the top 1% of FST values cluster into five distinct loci. The largest of these regions spans 5.4 Mb and contains 66% of the most highly differentiated HapMap SNPs on the X chromosome. We demonstrate that the extreme clustering of highly differentiated SNPs on the X chromosome is not an artifact of ascertainment bias, nor is it specific to the populations genotyped in the HapMap Project. Rather, additional analyses and resequencing data suggest that these five regions have been substrates of recent and strong adaptive evolution. Finally, we discuss the implications that patterns of X-linked population structure have on the evolutionary history of African populations.
Remember that Fst is measuring the genetic variance between and within populations. As Fst approaches 1, that means all the variance can be partitioned between groups. For example:
Allele frequency 1 = 1.0
Allele frequency 2 = 0.0
Allele frequency 1 = 0.0
Allele frequency 2 = 1.0
All the variance is between the populations, not within them. There's no difference within the population, so it works by definition. By contrast, Fst approaches 0 when all the variance is within the population, and not between. For example:
Allele frequency 1 = 0.5
Allele frequency 2 = 0.5
Allele frequency 1 = 0.5
Allele frequency 2 = 0.5
There's a lot of variance within both populations, but none between. In other words, Fst is telling you whether there's any point in looking at population substructure. In the latter case obviously you can throw everything into a big bin and not lose any information (assuming HWE in both). In the first case, pooling the populations together would mask the fact that there's lot of between population variance, which might be important.
In the paper they note that that between population variance in the form of higher Fst has a larger basal value in the X chromosome, likely because the X has a smaller long term effective population size. Remember that males have only one X, and we confer only one X to our offspring. There are fewer copies of the X floating around than autosomal chromosomes, those which are not sex chromosomes. This naturally reduces the long term effective population, and so makes the X more susceptible to stochastic fluctuations in frequency such as random genetic drift. When populations are separated and there is minimal gene flow genetic drift will generally increase between population variance. There's a large space to "random walk" across in terms of gene frequency, and turnover of neutral alleles will produce very different patterns of variation (consider the random patterns generated by scattershot firing of a gun; noise is diverse).
But the authors of this paper felt that they saw something else. Natural selection acting upon genomic regions, fixing particular alleles, producing between population variation. Here's a figure which illustrates the variation in Fst across the X chromosome. The top two panels are for the HapMap dataset, while the bottom two are for the Perlegen. Additionally, the second of each pair shows the cluster of loci above the 99th percentile in Fst across the genome.
And here are the genes around the high-Fst clusters:
Many of these genes sit in regions which exhibit haplotypes which are on the order of 500 kb long, so no surprising that some SNPs within these genes have popped up on tests for detecting natural selection based haplotype structure. All but one of the genes above are at higher frequency in the derived form in Eurasians than in Africans. Derived as in the younger mutant variant has increased in frequency and replaced the older variant. Interestingly in Africans the centromeric variant is derived. Here are the frequencies for an SNP at that locus from the HGDP dataset:
black = ancestral
white = derived
The authors note that the derived variant in Africans is not a function of Bantu ancestry. In other words, there isn't a simple demographic explanation of this pattern. Here are the authors in the discussion:
The modern Recent African Origin model for human evolution explains the high genetic variation in contemporary African populations, relative to genomic regions with sharply reduced variation in non-Africans, by presupposing that human migrations out of Africa involved strong founder effects. Hence, a combination of genetic drift and local adaptation can readily account for the existence of derived alleles at high frequencies in non-African populations but low frequencies within Africa. Much less is known about African population history, particularly in the past 50,000-100,000 years during which founders of contemporary non-African populations emigrated into Europe and Asia. Our results suggest that a single African population, ancestral to contemporary Africans, may have remained a relatively coherent and local entity long enough for natural selection to sweep the cluster of derived alleles we describe to near fixation.This process would have occurred either after the initial out-of-Africa migrations or, equally as plausible based on current data, in an African population different than the one from which these out-of-Africa migrations occurred. Under this model, the ancestral African population would necessarily have been large to account for both the levels of variation and substructure evident in contemporary African populations.
It is common to say that "we are all Africans." That Bushmen, for example, are the most "ancient humans." This seems to presuppose that Africans have been genetically stationary, while other groups have gone their own way. But the frequency of the Duffy allele in Africa, a response to malaria which emerged in the last 10,000 years, falsifies this simplistic narrative. All human populations are equally ancient, and have derived from ancestral populations. There are no living fossils. It is genes, in the form of ancestral alleles, which may be envisaged as "living fossils," not peoples (though some of these genes are subject to great functional constraint, which means that you want to fossilize the good).
Citation: Lambert, Charla A.; Connelly, Caitlin F.; Madeoy, Jennifer; Qiu, Ruolan; Olson, Maynard V.; Akey, Joshua M. doi:10.1016/j.ajhg.2009.12.002
Figured I post this here since this blog deals with this type of topic genetics/anthropology and language. Were these peoplelisted in previous posts about genetic groups in India? I forget the large number of tribal groups and languages listed listed.
Last speaker of ancient language of Bo dies in India
By Alastair Lawson
Something in me wants to explain these results by stupid, mechanical reasons - e.g. crossover is more difficult / easier near the centromere. Or some kind of genomic imprinting. Whatever.
But I guess that's difficult to reconcile with the remarkable pattern of African derivation.
As for the higher Fst in the X chromosome in general, isn't that what one would expect under higher rates of migration for men than women?
I've speculated in the last year that future DNA studies will find that Sub-Saharans are a fairly recent admixed group - say in the last 5-10K years or so - between incoming Eurasian males, carrying Y-DNA E1b1a (aka E3a) and bringing agriculture - probably in the form of pastoralism initially - and local women similar to today's pygmies or possibly bushmen.
To me the only truly African Y-DNA lineages are A and B, which today are found predominantly among the Kung-San (aka Bushmen) and groups like the Dinka of the Sudan, among others.