Are Chinese subsets of Southeast Asians?

That's probably the big takeaway of a new paper on the genetics of Asians, a set which includes South Asians, but in the new research mostly focuses on the people of East Asia. In a global context this work is important. The backstory is that there are disagreements about the exact process of the "Out of Africa" migration. Most researchers would agree that the vast majority, perhaps all, of the distinctive genetic content of the human species derives from a migration from the African continent between 50 and 100 thousand years ago (closer to the former date than the latter likely). Note that there were other human lineages outside of Africa, the Neandertals being the most prominent, but various "archaic" groups were extant in eastern Asia as well down to the arrival of modern African-derived human groups. This is part of the reason why H. floresiensis isn't that outlandish, a lineage of H. er ectus was extant in Southeast Asia until the ~50,000 years ago, with the arrival of moderns.

Those are the agreements. The disagreement, in particular in regards to East Asia, is rather simple. Was there one, or two, waves from Africa, and did one, or both, settle East Asia? The two-wave model was promoted heavily in the early aughts by Spencer Wells. The whole argument is laid out in his book The Journey of Man: A Genetic Odyssey. The title hints to the fact that Wells and his collaborators primarily focused on paternal lineages, the Y chromosomes, in their reconstructions. Here's a screenshot from The Genographic Project which highlights the two-wave model:


In the context of East Asia, the two-wave model posits that there was a southern coastal migration, which pushed into Australia via southern India. And, there was a northern migration up through Central Asia from which arose both Europeans and East Asians.

The new paper in Science is Mapping Human Genetic Diversity in Asia:

Asia harbors substantial cultural and linguistic diversity, but the geographic structure of genetic variation across the continent remains enigmatic. Here we report a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations. Our results show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography. Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene flow among populations. More than 90% of East Asian (EA) haplotypes could be found in either Southeast Asian (SEA) or Central-South Asian (CSA) populations and show clinal structure with haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic source of EA populations.

I don't think that the insight that language & genes are closely correlated will be that surprising. This was highlighted by L. L. Cavalli-Sforza on a coarser scale two decades ago, but has now been validated in a more fine-grained fashion by larger data sets and more powerful analytical techniques. The reasons for this should be obvious: marriage networks will coalesce around comprehensibility. There are exceptions, Tommy Lee's Greek-born mother reputedly did not speak English when she married his father, an American serviceman stationed in Europe.

To illustrate the relationships they constructed:

a) A maximum-likelihood based phylogenetic tree, showing relationships between groups

b) Also a Structure based chart which shows ancestral proportions inferred from 14 putative source populations

I've reedited and formatted figure 1 to fit on the screen clearly. The key here is to look how at the separation occurs as a function of language. This is not a magical process, no doubt language served as a barrier between groups in the past, and to some extent is also a cultural signal which can be used to infer the past identity of a given set of individuals in the present (this is an intuition which naturally less strongly perceived by Americans).


Note what I said in the title: the peoples of Northeast Asia are viewed in this study as subset a of various Southeast Asian groups. Additionally, the Indians branch out as the furthest outgroup, as we'd expect.

To get a sense of the relationships geometrically there is the obligatory PC plot. From figure 2 I want to focus on panels B & D. The first includes Indians and Europeans, while the second does not, and excludes the outliers among the East Asian populations. In other words, the first chart gives a sense of East Asian variation in a worldwide context, while the last is a finer-grained snapshot which elucidates the details of relationships among East Asians.


Remember that the axes represent independent dimensions which can be extracted out of the genetic variance data. PC 1 represents the dimension which has the largest component of variance, and PC 2 the second largest, and so forth. Much of nature of the scatter (or lack thereof) in this figure is predicable from previous work. Europeans are relatively homogeneous on a worldwide scale, but the Europeans represented by "CEU" are Mormons from Utah, who are themselves a subset of European variation. So the tight cluster is expected. Indians vary quite a bit along an axis. A recent paper has offered an explanation for why Indians so often seem to exhibit a linear distribution: South Asians can be conceived as a two-way admixture between a European-like population, likely invasive to the subcontinent, and an older resident population with distant affinities to the peoples to their east. This ancient eastern affiliated substrate, upon which the European-like element was overlain, is very evident in mtDNA lineages. "CN-UG" are Uyghurs, who are a relatively recent hybrid population.

The second panel is also somewhat expected. Recent work has reaffirmed a strong north-south cline within China among the Han. Chinese data have been confused, but these results seem to tacitly support the contention that the peoples of South China were often culturally assimilated and absorbed into the Han identity.


Finally, here's a figure which shows haplotype diversity declining as one moves north. A similar figure could be drawn along a line out of Africa to the northeast. Or from East Asia to North America. The inference is that the population with reduced diversity is derived from the population with greater diversity. The reason for this can be illustrated by the old example of a photocopy; subsequent copies have less information than previous copies. When a daughter population emerges from a parent population generally the former is a subset of the latter, and so it is less diverse. There may be a bottleneck whereby many distinctive alleles disappear through extinction. In this case the inference is that the populations of China, Japan, etc., are derived from a Southeast Asian group, ergo, they are less diverse. Not only that, but in terms of distinctive alleles the northern groups seem to be a subset of the southern groups. Here is the final paragraph of the paper:

To unambiguously infer population histories represents a considerable challenge...Although this study does not disprove a two-wave model of migration, the evidence from our autosomal data and the accompanying simulation studies...point toward a history that unites the Negrito and non-Negrito populations of Southeast and East Asia via a single primary wave of entry of humans into the continent

This conclusion rests on some assumptions. Here is the anthropology blogger Dienekes bringing up some objections:

As to the main thesis of the paper, namely that East Asians are descended from Southeast Asians rather than Central Asians, I have to say that I am not convinced. This thesis is based on two observations: minimum sharing between East Asians and Central/South Asians and south-north reduction of genetic diversity in East Eurasians. However, the high genetic diversity in Southeast Asians can be explained if they are taken to be old hybrids of Mongoloid northerners with "Australoid"-like southerners as physical anthropology suggests, and the seeming absence of influence of present-day Central/South Asians is due to the fact that the latter are largely Caucasoids of western Eurasian origin, and, thus, do not represent any putative ancestral populations to modern Mongoloids.

In other words, Dienekes is suggesting that the populations of Southeast Asia emerged in the same manner as those of South Asia, an admixture event between an indigenous substrate and an exogenous northern population. In China we have a great deal of historical evidence which points to north-south migration, in particular in the period between 500 and 1500. The greater diversity of the South Chinese may then derive in part from the fact that they are an admixed population, who carry within them the genetic heritage of the indigenous peoples, as well as Han immigrations from the north. But, it may also be true that the original Han were migrants from the south. One might posit the same with the more general model of East Asia, with the original Northeast Asians being derived from Southeast Asians, and contemporary Southeast Asians being admixtures between a "back-migration" from Northeast Asia and the local substrate. We know specifically that many of the peoples of Indochina have origins on South China, so this is not without some support from history.

Citation: Mapping Human Genetic Diversity in Asia, Science 326, 1541 (2009);
The HUGO Pan-Asian SNP Consortium, et al., DOI: 10.1126/science.1177074

More like this

"back-migration" - a pleasant reminder of the Beaker People and their purported tendency to "reflux".

By bioIgnoramus (not verified) on 15 Dec 2009 #permalink

Heh. This might make the US Census category Asian as a natural grouping more justifiable.

Would it be likely that the ASI part of the Indian mix is closely related to the South-East Asian stock the East Asians evolved from?

sorry if this question shows gross ignorance, but what do the numbers represent in the phylogenetic tree?

Would it be likely that the ASI part of the Indian mix is closely related to the South-East Asian stock the East Asians evolved from?

not closely related. let's go back to indians, who are ANI + ASI. if you have ANI, ASI, Europeans, East Asians, the ASI + East Asians would form one clade, the Europeans + ANI the other. the Europeans and ANI are rather close. the ASI and East Asians not so close. but closer to each other than they are to ANI & Europeans. or at least that's the model.

Stephen Oppenheimer has a single migration out of Africa

The green seems to be a Taiwanese aborigine (Austronesian) signature, and the only others who have ample amounts of it are Phillippines "negritos", Melanesians, and southern Chinese, which makes sense of as they would be admixed with the Austronesian phenomenon.

In microcosm, if this is to be interpreted by the HUGO team as Taiwanese aborigines as being derived from the more diverse Phillippines "negritos", then they mistook the buttock for the head.

sorry if this question shows gross ignorance, but what do the numbers represent in the phylogenetic tree?

pzed, those would be bootstrap values.

By Altyn Khan (not verified) on 15 Dec 2009 #permalink

I'm still amazed at how "geographically correct" the PCA plots are. Europeans, Indians, East Asians fall almost exactly where they should be in a "real" map (however, Philippines / Malaysia are a bit off). This can be expected when we deal with close-by populations with constant, isotropic gene flow (e.g. within Europe). But between NW Europeans (CEU) and Japanese?...

Actually perhaps this is another sign that Europeans and East Asians really do originate from the same branch of Out-of-Africa migration. From this common origin, normal diffusion brings the familiar pattern of geographically correct PCs aligned with N-S and E-W axes.

Perhaps the fact that Malaysian/ Philippine Negritos do not fit the expected geographical pattern indicates that they do not come from the same branch, and therefore the PCA map (which is dominated by the other, EU-EA-India branch, since this has the largest variance) cannot be expected to apply to them?


thanks altyn!

My thinking is that east-Asians show strong adaptation for extreme cold weather, and that fits with the time of their split from the Caucasians (possibly due to a glacial intrusion in central Eurasia cutting off the two populations). Subsequent glacial maximum would have driven them south into SE Asia, overrunning the existing Australoid population there, and interbreeding to some extent. Then as the weather warmed they expanded north again. This explains why NE Asians are related to SE Asians but show less genetic diversity.

The timings would seem to be around ca 20-25,000 YBP for the Caucasian/east-Asian split, and around 10,000 YBP for the re-expansion northwards of the east-Asians.