Sometimes I wonder if the period between the publication of The History and Geography of Human Genes and The Journey of Man, roughly from the mid-90s to the early 2000s, will be seen as a golden age for historical population genetics in hindsight. A few weeks ago I pointed to new data based on DNA extraction which really confuses the picture of how Europe was populated over the past 25,000 years. It seems the more data we get, the more interesting things get. In the late 1990s the emergence of powerful technologies to extract and amplify genetic material and sequence it shed light on several questions which had long tantalized researchers ever since Alan Wilson’s group began to push the frontiers of molecular evolution in the 1970s. Where in the 1980s there was only the mitchondrial Eve story, by the year 2000 there was enough to go around for several books. The Journey of Man, Mapping Human History and The Seven Daughters of Eve all came out very close together chronologically. These scientists and writers knew that striking fast was imperative.
Though some broad models remain robust in the face of data, consider the hypothesis of a recent expansion of our species out of Africa, in the details there are many complications of simple narratives. Spencer Wells had a good story to tell in The Journey of Man. Looking at Y chromosomal lineages he concluded that when humans left Africa, some took a “Southern Route” through India to the east, while others took a “Northern Route” via Central Asia, where the ancestors of Chinese and Europeans parted ways. Just look at the map if you don’t believe me. Of course some of the data from the new genetics don’t totally floor us. Genetics does not support the Soultrean hypothesis. But big coarse models are often in for some trouble. The idea that Central Asians gave rise to Europeans and East Asians may be one of those hypotheses which was too elegant for reality.* The problem is that Central Asia is chock-full of hybrid populations, which look to be the outcomes of well known historical and social processes in the light of history. It is as if Africans turned out to be part-Australian and part-European in their genetics when they’re supposed to be the antecessor population! (another model could be that those Central Asians who gave rise to Europeans and East Asians are extinct, but then how could one make inferences based on their genes when surveying contemporary populations?)
But the framework established in the early 2000s by Wells and the HPGL is still alive & kicking. For example, Inferring human colonization history using a copying model:
Genome-wide scans of genetic variation can potentially provide detailed information on how modern humans colonized the world but require new methods of analysis. We introduce a statistical approach that uses Single Nucleotide Polymorphism (SNP) data to identify sharing of chromosomal segments between populations and uses the pattern of sharing to reconstruct a detailed colonization scenario. We apply our model to the SNP data for the 53 populations of the Human Genome Diversity Project described in Conrad et al. (Nature Genetics 38,1251-60, 2006). Our results are consistent with the consensus view of a single “Out-of-Africa” bottleneck and serial dilution of diversity during global colonization, including a prominent East Asian bottleneck. They also suggest novel details including: (1) the most northerly East Asian population in the sample (Yakut) has received a significant genetic contribution from the ancestors of the most northerly European one (Orcadian). (2) Native North [corrected] Americans have received ancestry from a source closely related to modern North-East Asians (Mongolians and Oroquen) that is distinct from the sources for native South [corrected] Americans, implying multiple waves of migration into the Americas. A detailed depiction of the peopling of the world is available in animated form.
Overall genomics seems to support the idea of serial bottlenecks or sampling events which reduced genetic diversity as populations drifted further and further away from Africa. As first approximations they work, and as big-picture generalizations they have utility, but sometimes these papers get a bit too granular for their own good. For example:
The first 8 East Asian populations (Cambodia, Mongolia, Oroqen, Xibo, Yi, Tu, Daur, Naxi) have 50-84 donors, including all 32 individuals from two central Asian populations, the Uygur and the Hazara (except the Tu who use 24/32). This represents an entirely distinct source of ancestry from European populations, who each receive less than 10% of their ancestry via the Uygur and almost none via the Hazara
This is unlikely to be describing what really happened, even if the data and model fit together appropriately. It can’t because these two populations were newly created through the fusion of East and West Asian populations. I’ve mentioned the Uyghurs many times. The Hazara are Shia Muslims who speak Dari and live in the highlands of Afghanistan. Their physiognomy is classically Eurasian (e.g., this Hazara girl). Their own oral history traces their origins back to the arrival of Mongol armies in the 13th century. And they happen to have a high frequency of the Genghis Khan haplotype. Finally, they aren’t attested as a people before 1500, mighty strange.
There is another paper out which probes the questions of the Uyghurs genetically to try and resolve the nature of their admixture. That is, is there equidistant genetic position between Europeans and Chinese due to the fact that these two populations are subsets of Uyghurs, or are the Uyghurs a compound of East and West Eurasian populations? Haplotype-Sharing Analysis Showing Uyghurs Are Unlikely Genetic Donors:
The Uyghur (UIG) are a group of people primarily residing in Xinjiang of China, which is geographically located in Central Asia, from where modern humans were presumably spread in all directions reaching Europe, east, and northeast Asia about 40 kya. A recent study suggested that the UIG are ancestry donors of the East Asian (EAS) gene pool. However, an alternative hypothesis, that is, the UIG is an admixture population with both EAS and EUR ancestries is also supported by our previous studies. To test the two competing hypotheses, here we conducted a haplotype-sharing analysis (HSA) based on empirical and simulated data of high-density single nucleotide polymorphisms. Our results showed that more than 95% of UIG haplotypes could be found in either EAS or EUR populations, which contradicts the expectation of the null models assuming that UIG are donors. Simulation studies further indicated that the proportion of UIG private haplotypes observed in empirical data is only expected in alternative models assuming that UIG is an admixture population. Interestingly, the estimated ancestry contribution of 44%:56% (EAS:EUR) based on HSA is consistent with our previous estimation with STRUCTURE analysis. Although the history of UIGs could be complex, our method is explicit and conservative in rejecting the null hypothesis. We concluded that the gene pool of modern UIGs is more likely a sole recipient with contribution from both EAS and EUR.
You likely know that Africans are the “most genetically diverse people in the world.” In fact, non-Africans are a subset of Africans genetically, by and large. Not only are they a subset of Africans, but non-Africans are a subset of Northeast Africans! This is not a coincidence as presumably non-Africans left the African continent via the Northeast. This is basically a photocopying effect. As populations expand and reproduce, and small groups fission off the main body, these smaller groups only imperfectly reproduce the full range of variation in the parent group. Over time parental variance disappears, slowly to be replaced by a group’s own new mutations. A rapid population reduction, a bottleneck, tends to sweep away a lot of genetic variance. Imagine turning the quality setting of the copier down a lot.
The “Northern Route” model assumes that Central Asians should be the parental or donor group vis-a-vis the populations of the antipodes of Eurasia. The first paper I pointed to seemed to implicitly support this model, but as I noted above there is a great deal of evidence that the groups offered up are relatively new to human history.
To address the question of whether Uyghurs are “donors” or not to Europeans and Chinese the authors of the second paper looked at the variation of alleles across the populations.
More specifically, they fixed upon the haplotypes which were “private” to a group, that is, not shared by other groups but unique identifiers. They also noted two other classes: haplotypes which were shared by all groups and haplotypes shared between some groups, but not all (in this case, between two groups and excluding a third).
Africans have many private haplotypes vis-a-vis non-Africans. They haven’t gone through the same bottleneck process which expunge variation out of non-African groups, and have no doubt added to their roster of mutations since non-Africans left. If the Uyghurs are antecedent to the populations of West and East Eurasia one presumes that the same would hold in this case for them, if somewhat mitigated. Left to right are bars which represent various window lengths across the chromosome used to evaluate haplotypes. The top slice represents private haplotypes, and it is clear that Uyghurs have very few private haplotypes. In other words, most Uyghur haplotypes are found in either the Chinese or Europeans, or the both. Here is a chart which illustrates heterozygosity:
So on the one hand Uyghurs don’t have much genetic variation unique themselves, but they are rather genetically variant. Heterozygosity refers to having different alleles, genetic variations, at a locus (think about dominant-recessive charts). These two pieces of data are consistent with a population which is recently admixed. It hasn’t had enough time for mutation and recombination to generate new haplotypes which aren’t found in the parental populations. But, because it has alleles from two very different populations, so the likelihood of heterozygosity is high. As a reductio ad absurdum, a new population of biracial individuals doesn’t likely have many private haplotypes not found in the parental groups. But they’re likely to be very heterozygous.
But this group didn’t stop there. They ran some simulations. Haplotype variation is going to be dependent on parameters such as population size, recombination rate, admixture rate, etc. Out of their models they reported the results in terms of private alleles for 4 models.
– Model 1, which has the Uyghurs be donors, that is the parental population at a distant point in time. This produced way too many private alleles.
– Model 2 is a modification of 1, adding the dynamic of gene flow from Chinese and Europeans over time. Again, way too many private alleles.
– Model 3 is admixture recently, and then no gene flow. This one had far fewer private alleles, but was off by an order of magnitude from the empirical proportion.
– Model 4 is like model 3, but also had continuous gene flow from the point of admixture. This one seemed to fit the empirical data.
Models are models, and one should be careful (who knows how they played around with the parameters?), but, they point in the direction of the inference we would have made already looking at just the empirical data, with the additional possibility of continuous gene flow after the original formation of the group is likely. I think that counts as some return on model-building.
All that to tell us what we strongly suspected we knew. But sometimes it is useful to test drive methodologies against a reality we’re confident about, just to see whether the methodology will lead us astray. For me the take home picture is that tunneling down from on high is probably not fruitful when it comes to historical human population genetics. Specific examinations though do seem to have merit and utility.
Citation: Molecular Biology and Evolution 2009 26(10):2197-2206; doi:10.1093/molbev/msp130