The tales that the X chromosome tells

By razib on February 4, 2010.

Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History:

It is well known that average levels of population structure are higher on the X chromosome compared to autosomes in humans. However, there have been surprisingly few analyses on the spatial distribution of population structure along the X chromosome. With publicly available data from the HapMap Project and Perlegen Sciences, we show a strikingly punctuated pattern of X chromosome population structure. Specifically, 87% of X-linked HapMap SNPs within the top 1% of FST values cluster into five distinct loci. The largest of these regions spans 5.4 Mb and contains 66% of the most highly differentiated HapMap SNPs on the X chromosome. We demonstrate that the extreme clustering of highly differentiated SNPs on the X chromosome is not an artifact of ascertainment bias, nor is it specific to the populations genotyped in the HapMap Project. Rather, additional analyses and resequencing data suggest that these five regions have been substrates of recent and strong adaptive evolution. Finally, we discuss the implications that patterns of X-linked population structure have on the evolutionary history of African populations.

Remember that Fst is measuring the genetic variance between and within populations. As Fst approaches 1, that means all the variance can be partitioned between groups. For example:

Population A:
Allele frequency 1 = 1.0
Allele frequency 2 = 0.0
--------------------------------------------------------------------------
Population B:
Allele frequency 1 = 0.0
Allele frequency 2 = 1.0

All the variance is between the populations, not within them. There's no difference within the population, so it works by definition. By contrast, Fst approaches 0 when all the variance is within the population, and not between. For example:

Population A:
Allele frequency 1 = 0.5
Allele frequency 2 = 0.5
--------------------------------------------------------------------------
Population B:
Allele frequency 1 = 0.5
Allele frequency 2 = 0.5

There's a lot of variance within both populations, but none between. In other words, Fst is telling you whether there's any point in looking at population substructure. In the latter case obviously you can throw everything into a big bin and not lose any information (assuming HWE in both). In the first case, pooling the populations together would mask the fact that there's lot of between population variance, which might be important.

In the paper they note that that between population variance in the form of higher Fst has a larger basal value in the X chromosome, likely because the X has a smaller long term effective population size. Remember that males have only one X, and we confer only one X to our offspring. There are fewer copies of the X floating around than autosomal chromosomes, those which are not sex chromosomes. This naturally reduces the long term effective population, and so makes the X more susceptible to stochastic fluctuations in frequency such as random genetic drift. When populations are separated and there is minimal gene flow genetic drift will generally increase between population variance. There's a large space to "random walk" across in terms of gene frequency, and turnover of neutral alleles will produce very different patterns of variation (consider the random patterns generated by scattershot firing of a gun; noise is diverse).

But the authors of this paper felt that they saw something else. Natural selection acting upon genomic regions, fixing particular alleles, producing between population variation. Here's a figure which illustrates the variation in Fst across the X chromosome. The top two panels are for the HapMap dataset, while the bottom two are for the Perlegen. Additionally, the second of each pair shows the cluster of loci above the 99th percentile in Fst across the genome.

i-6b551742c569e96067ae7641de32ca72-PIIS0002929709005631.gr1.lr.png

And here are the genes around the high-Fst clusters:

Many of these genes sit in regions which exhibit haplotypes which are on the order of 500 kb long, so no surprising that some SNPs within these genes have popped up on tests for detecting natural selection based haplotype structure. All but one of the genes above are at higher frequency in the derived form in Eurasians than in Africans. Derived as in the younger mutant variant has increased in frequency and replaced the older variant. Interestingly in Africans the centromeric variant is derived. Here are the frequencies for an SNP at that locus from the HGDP dataset:

i-468e3e621173feabe006d1e8f055e483-PIIS0002929709005631.gr5.lr.png

black = ancestral
white = derived

The authors note that the derived variant in Africans is not a function of Bantu ancestry. In other words, there isn't a simple demographic explanation of this pattern. Here are the authors in the discussion:

The modern Recent African Origin model for human evolution explains the high genetic variation in contemporary African populations, relative to genomic regions with sharply reduced variation in non-Africans, by presupposing that human migrations out of Africa involved strong founder effects. Hence, a combination of genetic drift and local adaptation can readily account for the existence of derived alleles at high frequencies in non-African populations but low frequencies within Africa. Much less is known about African population history, particularly in the past 50,000-100,000 years during which founders of contemporary non-African populations emigrated into Europe and Asia. Our results suggest that a single African population, ancestral to contemporary Africans, may have remained a relatively coherent and local entity long enough for natural selection to sweep the cluster of derived alleles we describe to near fixation.This process would have occurred either after the initial out-of-Africa migrations or, equally as plausible based on current data, in an African population different than the one from which these out-of-Africa migrations occurred. Under this model, the ancestral African population would necessarily have been large to account for both the levels of variation and substructure evident in contemporary African populations.

It is common to say that "we are all Africans." That Bushmen, for example, are the most "ancient humans." This seems to presuppose that Africans have been genetically stationary, while other groups have gone their own way. But the frequency of the Duffy allele in Africa, a response to malaria which emerged in the last 10,000 years, falsifies this simplistic narrative. All human populations are equally ancient, and have derived from ancestral populations. There are no living fossils. It is genes, in the form of ancestral alleles, which may be envisaged as "living fossils," not peoples (though some of these genes are subject to great functional constraint, which means that you want to fossilize the good).

Citation: Lambert, Charla A.; Connelly, Caitlin F.; Madeoy, Jennifer; Qiu, Ruolan; Olson, Maynard V.; Akey, Joshua M. doi:10.1016/j.ajhg.2009.12.002

More like this

Figured I post this here since this blog deals with this type of topic genetics/anthropology and language. Were these peoplelisted in previous posts about genetic groups in India? I forget the large number of tribal groups and languages listed listed.

http://news.bbc.co.uk/2/hi/south_asia/8498534.stm
Last speaker of ancient language of Bo dies in India
By Alastair Lawson
BBC News

Something in me wants to explain these results by stupid, mechanical reasons - e.g. crossover is more difficult / easier near the centromere. Or some kind of genomic imprinting. Whatever.

But I guess that's difficult to reconcile with the remarkable pattern of African derivation.

As for the higher Fst in the X chromosome in general, isn't that what one would expect under higher rates of migration for men than women?

I've speculated in the last year that future DNA studies will find that Sub-Saharans are a fairly recent admixed group - say in the last 5-10K years or so - between incoming Eurasian males, carrying Y-DNA E1b1a (aka E3a) and bringing agriculture - probably in the form of pastoralism initially - and local women similar to today's pygmies or possibly bushmen.

To me the only truly African Y-DNA lineages are A and B, which today are found predominantly among the Kung-San (aka Bushmen) and groups like the Dinka of the Sudan, among others.

More here:
http://dienekes.blogspot.com/2008/02/maternal-common-ancestry-between.h…

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

Remember to switch RSS feeds

April 3, 2010

If you link to this weblog from your weblog, please update links: http://blogs.discovermagazine.com/gnxp/ If you have not updated your feeds, please do so now: http://feeds.feedburner.com/GeneExpressionBlog The old feed address will point for another week or so to the new feed, but eventually it…

I'm moving to Discover

March 26, 2010

Update your bookmarks: http://blogs.discovermagazine.com/gnxp And RSS: http://feeds.feedburner.com/GeneExpressionBlog If you have a weblog that links to ScienceBlogs GNXP, I would appreciate you update the link for the sake of PageRank. There isn't much to say about the move. There wasn't one big…

Canada is not a "free society"

March 24, 2010

That's all I have to say to Eric Michael Johnson's post, Ann Coulter, Hate Speech, and Free Societies. OK, seriously, from what I recall Eric is an American, though resident in the forgotten north. American absolutist stances on free speech are not shared by most Western societies, so demanding…

Others in Siberia

March 24, 2010

The complete mitochondrial DNA genome of an unknown hominin from southern Siberia: With the exception of Neanderthals, from which DNA sequences of numerous individuals have now been determined...the number and genetic relationships of other hominin lineages are largely unknown. Here we report a…

The biophysical limits of cognitive computation

March 23, 2010

In this diavlog with Glenn Loury the behavioral economist Sendhil Mullainathan recounts the results of an experiment. - If given the option of paying $100 for an item vs. $80 for an item, but in the second case having to go across town for the item, respondents choose $80 and going across town - If…