Gene Expression

Shadows of the past in genes

A new paper came out in Science this week, Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation, that’s getting some media play. The second-to-last author is L. L. Cavalli-Sforza, and the general combination of means and ends on display in The History and Geography of Human Genes, is all over it. From the introduction:

We first studied genetic ancestry of each individual without using his/her population identity. This analysis considers each person’s genome as having originated from K ancestral but unobserved populations whose contributions are described by K coefficients that sum to 1 foreach individual. To increase computational efficiency, we developed new software, frappe, thatimplements a maximum likelihood method (13)to analyze all 642,690 autosomal SNPs in 938 unrelated and successfully genotyped HGDPCEPH individuals…Figure 1A shows theresults for K= 7; those for K= 2 through 6 are in fig. S1. At K= 5, the 938 individuals segregate into five continental groups, similar to those reported in a microsatellite-based study of the same panel…At K= 6, the new component accountsfor a major portion of ancestry for individuals from South/Central Asia, separating this regionfrom the Middle East and Europe…K= 7, the new component occurs at highest proportions inthe Middle Eastern populations, separating them from European populations. In many populations, ancestry is derived predominantly from one of the inferred components, whereas in others, especially those in the Middle East and South/Central Asia, there are multiple sources of ancestry….

All good. Do note that the South Asian populations in the HGDP set are all from the northwest edge of the subcontinent, so the separation between the South Asians as an outgroup to the Middle Eastern & European branches of the broad West Eurasian cluster is going to be understated in these studies. In any case, I was interested in some details of the the first figure, so I reedited it a bit for clarity:


I’m obviously focusing on West Eurasian & North African populations here. So you see components of ancestry which is predominant in Middle Easterners, Europeans and Central/South Asian populations. I included the Hazara and Uygurs because they are hybrid East-West Eurasian populations of recent provenance for comparison. Check out the Tuscan group. Though predominantly of the European ancestral component, note the spike in what might be characterized as the Middle Eastern component. Tuscans obviously live in Tuscany, which was ancient Etruria. That is, the region in which the Etruscans resided. This is a nice illustration that science is not just made up, and separate lines of evidence converge. You might wonder as to the relatively weak Middle Eastern signal, but remember that it seems likely that the Etruscans were an Anatolian population (as were most of the Neolithic Middle Easterners who came to Italy), and these groups probably have a lower initial proportion of the Middle Eastern component than the Levantine groups which are within the HGDP. Remember that there is a north-south division in Middle Eastern haplogroup J.

A second observation, and a far more tentative one, I want to make is about the “blue” component which seems to be associated with Central/South Asian populations to the greatest extent. Remember, these groups are sampled from the northwest of the Indian subcontinent, so they may not be representative of southern Asia as a whole. See this figure, which includes populations further to south and east. With that caveat, which populations outside of the core Central/South Asian groups show a strong indication of this ancestral component? The Uygur, Adygei, Hazara, Druze, Palestinians, Bedouin and Russians, in that order. In regards to the more obscure groups, the Druze are a religious sect who are concentrated in the mountains of Lebanon. The Hazara are an Afghan population which speaks a Persian dialect but claim gnetically verified Mongol ancestry. The people of Adygei are diverse, a mix of Russians and various indigenous people of the northern Caucasus region, many of the latter who speaking non-Indo-European dialects. Finally, the Uygur are a Turkic speaking group dominant in the oases of Xinjiang in western China.

My hypothesis? Perhaps the Central/South Asian component is in part a signature of an ancient expansion of populations from the center of Eurasia, with some vague connection with the spread of Indo-European language families? The Adygei and Brahui do not speak Indo-European languages, the Brahui in fact are a Dravidian dialect which might be residue of the previous extent of that language family. But they have been bathed in a sea of Indo-European speaks for several south years. The Adygei have long been neighbors of Indo-European peoples, from the Scythians to the Slavs. As for the Uygurs, they are known to be a new ethnic group insofar as they are the outcome of the Turkicization of the Indo-European speaking peoples of the Tarim Basin. 1,500 years ago the residents of the oases of this region spoke Indo-Iranian dialects, or, Tocharian ones. What about the Middle Eastern groups? First, note that Indo-European peoples were extant in the Bronze Age Levant. Additionally, before the Turkization of Anatolia that region of the Middle East was wholly dominated Indo-European speakers, whether it be Greeks, Armenians or Kurds. The Russians need no explanation, as I am positing the genetic signal spreading west and diluting on the wave of advance so that it disappears by the time it reaches western Europe.

Nevertheless, I do want to emphasize that is a tentative model and derived from a system of classification which no doubt exhibits some artificialities. I doubt, for example, that the Indo-Europeans and affinal people were genetically homogeneous. Note that I suggested that the Middle Eastern signal in Tuscany was weak in part because the original migrants from Anatolia had a strong component of ancestry which is predominant in Europe (the Levantine groups show this today, and likely their fraction is going to be somewhat lower than that of Anatolians). I also think that the story of Y chromosomal haplogroup R1a is a cautionary tale. This lineage is extant across South Asia, in a northwest to southeast cline, in Central Asia among many Indo-European speaking peoples, and in eastern Europe, decreasing in frequency to the west. It was quickly hitched to the Kurgan hypothesis of the spread Indo-European languages from a lower Volga heartland to the west, east and southeast. But further research strongly suggests that the connection between the Indian subcontinental and non-Indian R1a lineages is very ancient, perhaps dating back to the last Ice Age (this isn’t surprising if it does turn out that a disproportionate number of humans lived in the Indian subcontinent during the last Ice Age). In other words, the high frequency of R1a in South Asia might not be due to recent exogenous genetic inputs as much as the fact that the R1a haplotypes originated in South Asia. A related branch in Central Asian/Eastern Europe subsequently expanded, and might the ancestor of some of the R1a haplotypes in South Asia, but not most. A suspect a similar complex story is at work with the Central/South Asian ancestral component outside of central and southern Asia.


  1. #1 manju
    February 25, 2008

    When it comes to autosomal analysis correlation with mtDNA distribution makes better sense. A small influence need not mean small intruder male population. It influence might have been diluted by native maternal ancestry.

  2. #2 hello
    February 26, 2008

    hi Razib – I like your theory on the Central/South Asian component. Dienkes said that this study proves that the Kalash are not European. Do you agree with him? They are probably Indo-European as in your theory, right? (they certainly look Northern European, not Greek or South Asian)

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.