Indians as hybrids (a.k.a Aryan invasion in the house!)

A few months ago a friend tipped me off to the fact that David Reich was going to publish a paper about the genetics of Indians which he ascertained was going to model these populations as hybrids between "Europeans and Andaman Islanders." The paper is out, and my friend was roughly right. Reconstructing Indian population history:

India has been underrepresented in genome-wide surveys of human variation. We analyse 25 diverse groups in India to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the 'Ancestral North Indians' (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the 'Ancestral South Indians' (ASI), is as distinct from ANI and East Asians as they are from each other. By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39-71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy. We therefore predict that there will be an excess of recessive diseases in India, which should be possible to screen and map genetically.

The paper itself is relatively tight and concise; a lot of the sausage-making is thrown into the supplementary information. This is freely available online, and in fact I would suggest that the first half of supplement 1 has more meat than the paper itself.

As for that, the text is not as bold than the abstract, or the press summations which have appeared in its wake. For example, they say:

We warn that 'models' in population genetics should be treated with caution. Although they provide an important framework for testing historical hypotheses, they are oversimplifications. For example, the true ancestral populations of India were probably not homogeneous as we assume in our model, but instead were probably formed by clusters of related groups that mixed at different times. However, modelling them as homogeneous fits the data and seems to capture meaningful features of history.

I generally agree with the gist of this. The main issue I would also highlight is that these results only clarify and solidify what was likely from previous analyses of worldwide genetic variation. That is, the populations of Northwest India are closer to those of the Middle East & Europe than those of Southeast India are. It was rather awesome that they confirm that the Onge, who are almost extinct, are a relatively unadmixed ancient population. The Onge branch seems to descend from an ancestral population which also gave rise what is termed in the paper "Ancestral South Indian" (ASI). They exhibit no admixture with "Ancestral North Indians" (ANI). This paper confirmed and clarified as well as that the proportion of West Eurasian related lineages increases both as a function of geography and caste. That is, there is a SE-NW and lower-to-upper caste gradient whereby West Eurasian related lineages become more prevalent. This has long been known, but this paper did it with more SNPs across the genome.

Here is a table which shows the proportion of ANI is a range of populations:

i-2e40a7d89faa7e72dd4bc06cefbb0f59-indiareich1.png

All you really need to know about the Z-score is that negative scores indicate high levels of admixture. Here is a table which tells you a bit more about the populations above:

i-49fc5383e0e40d69696372cfdf945eaa-indiareich2.png

The following figure illustrates the general model which looms in the background of this paper:

i-c94cc2d80fd1a62d41a10200772cdd41-indiareich3.png

Note that the Andaman Islanders, the Onge, aren't really the ancestors of Indians on the mainland. Rather, they're a branch of the ancient population which presumably first settled South Asia, and close to the ASI. Who were the ASI? Since they aren't really around, we can only generate conjectures and inferences. In this paper the ANI are actually represented in some ways by Europeans, even though presumably the assumption is that both these are daughter populations of another group. Though not pushed very hard, they do mention proto-Indo-Europeans as the candidate for the ANI.

At this point, let's look at the PCA chart (I've reedited and labelled as usual):

i-17840dcbb92988d630b0390f363e5861-indiareich5.png

This should not surprise, previous work shows that South Asians distribute along an axis away from Europeans. One of the points in the paper is that there is both geographic and caste stratification. I added some labels, but I thought drilling-down was probably useful. I don't know all these groups off the top of my head, and I assume few of readers do either. So I zoomed in:

i-0bc311dfcb2c57cc3c9dea138bb786a3-indiareich7.png

I think some of the shortcomings with a sample size on the order of the low hundreds is rather clear. They couldn't even use all their samples, or some of the samples were not relevant to the question on hand. The Siddis are an Indian-African mix which emerged during the period of Muslim domination when that group imported black slaves. The Tibeto-Burman groups of Northeast India are interesting, but outliers. The general trends are clear, North Indian groups have more ANI than South Indian groups, and upper caste groups have more ANI than lower caste groups, but that is only with "all things equal." Note that upper caste South Indian groups clearly have more ANI than lower caste South Indians, but they have a lower proportion than some North Indian lower castes, and are in the range of one North Indian tribal group. Some of the outliers are also interesting; the lower caste individual similar to Austro-Asiatic tribals is from a group which resides in a region with many Austro-Asiatic peoples. Clearly there has been identity switching, so you have aberrations such as one North Indian tribal who clusters with Kashmiri Pandit Brahmins! The Austro-Asiatic group is also interesting, because they speak languages related to those of Southeast Asia. Here is a map of the Austro-Asiatic languages:

i-552b28b9b67b667a558c3a8cb82b7ce7-580px-Se_asia_lang_map.png

We know with near 100% certainty that much of Burma & Thailand were dominated by Mon-Khmer languages before the arrival of the Shan, Bamar (Burmans) and Thai peoples (to mention a few). This is matter of historical record, the rise of modern Burma and Thailand was largely a story of the eclipse of Mon and Khmer societies who transmitted to them much of the Indic character which they have (e.g., the northern populations often arrived as Mahayana Buddhists, but the Mon and Khmer Theravada Buddhism was adopted as the dominant religions in the new states). The position of the Munda languages is more confused, as some posit that they arrived from the east, while others argue that the the Austro-Asiatic languages expanded east from India. This is not going to be resolved in this blog post, but let me note that the genetic data above, which show an "eastern" affinity of the Munda, can be combined to with cultural datum such as the arrival of rice farming from the east and historical records which document the migration of populations from Burma, to construct a plausible east-west narrative. In contrast it seems an almost default position by many that the Austro-Asiatics are the most ancient South Asians, marginalized by Dravidians, and later Indo-Europeans. I would not be surprised if it was actually first Dravidians, then Austro-Asiatics and finally Indo-Europeans. Dravidian are found in every corner of the subcontinent (Brahui in Pakistan, a few groups in Bengal, and scattered through the center) while the Austro-Asiatics exhibit a more restricted northeastern range.

As I noted above, supplement 1 has a lot of gems. For example, the authors note that previous work which found little regional differentiation in Indian Americans might have been problematic because there is a great deal of intraregional variance which when collapsed loses essential information. This chart shows South Asians + Utah Whites + 85 American Gujaratis in light blue:

i-bb8f8c6fca4e8397f8e935195527b527-indiareich8.png

Note that about half of Gujaratis form their own unexplained cluster! Throwing them together in one pool would mask this phenomenon. Here's their possible explanation:

Interestingly, one of the GIH subgroups fall outside the main gradient of Indian groups, suggesting that they harbor substantial ancestry that is not a simple mixture of ASI and ANI. A speculative hypothesisis that some Gujarati groups descend from the founders of the "Gurjara Pratihara" empire, which is thought to have been founded by Central Asian invaders in the 7th century A.D. and to have ruled parts of northwest India from the 7-12th centuries. I. Karve noted that endogamous groups with names like "Gurjar" are now distributed throughout the northwest of the subcontinent, and hypothesized that that they likely trace their names to this invading group.

I don't know if this is plausible; perhaps a Gujarati reader would immediately recognize what this cryptic substructure is.

Next are two charts which shows Indians, Europeans, and Chinese. In the first the PCA was originally constructed with Europeans & Chinese, and the Indians were projected onto it using the variation found in the first two groups. In the second case, Indians and Chinese were used to construct the PCA, and Europeans projected.

i-f53e413c6cf4f955dd227a581e1a1a52-indiareich9.png

What you see is that Europeans are all equally related to Indians, but Indians exhibit a gradient of relationship to Europeans. That is, there is no European group which in particular resembles Indians via the connection with ANI; the distance between all European groups and ANI seems roughly equal. The Indians vary in their relationship to Europeans because they vary in their proportion of ANI.

In the table above there is a reference to the proportion of ANI and ASI in each Indian group. One question you might ask: how do you estimate the proportions of ancestry from groups which you don't have any information about because they no longer exist? Europeans and the Onge can serve as proxies for the ANI and ASI respectively, but how far does this get you? Well, the methods that they used (they have three) which determine ancestral proportions can be used on populations which exist. So here is a figure which shows how their methods compare when you look at a population where we know something concrete about their ancestral populations because those ancestral populations are still extant, African Americans:

i-4ef61ddf6b7c6b0c99dae5bcc1c788a7-indiareich10.png

I also believe that their calculations are roughly correct because they pass the smell test. It isn't as if this is the first study of the genetics of Indians. Though the assumptions of Structure based analysis are somewhat different, you can discern the same rank orders.

Moving back to the nature of population structure within India, as opposed to how Indians relate to non-Indians, one of the results which pops up is that South Asian groups seem to have very high Fst values relative to European ones when compared within regions or between neighbors. Remember that Fst is a rough measure of the genetic variation which occurs between groups. The famous maxim that "85% of variance is within races, and 15% between races," is Fst based. The Fst in that is case 0.15. Corrected for region & caste, they find that South Asian groups seem to have Fst values on the order of 3-4 times higher than equivalent European groups. This isn't too surprising, in History and Geography of Human Genes L. L. Cavalli-Sforza observes that Europeans are particularly homogeneous. Before the spate of 650 K SNP papers it was hard to find good stuff on the phylogeography of European populations because the techniques didn't have the power to differentiate them. On the other hand, anthropologists have long thought that India was riddled with differentiation. After all, there's the caste system. Indians are certainly physically diverse. Additionally, there is a line of thinking that India is the secondary Africa, insofar as most Eurasian and Australasian lineages go back to India. Like Africa, India may hold a great deal of diversity among its many populations because they're old, the oldest in Eurasia and Australia (in concert with endogamy of course). The authors though have another model:

We propose that the high FST among Indian groups could be explained if many groups were founded by a few individuals, followed by limited gene flow. This hypothesis predicts that within groups, pairs of individuals will tend to have substantial stretches of the genome in which they share at least one allele at each SNP. We find
signals of excess allele sharing in many groups.

They go on:

Six Indo-European- and Dravidian speaking groups have evidence of founder events dating tomore than 30 generations ago...including the Vysya at more than 100 generations ago...Strong endogamy must have applied since then (average gene flow less than 1 in 30 per generation) to prevent the genetic signatures of founder events from being erased by gene flow. Some historians have argued that ‘caste’ in modern India is an ‘invention’ of colonialism in the sense that it became more rigid under colonial rule. However, our results indicate thatmany current distinctions among groups are ancient and that strong endogamy must have shaped marriage patterns in India for thousands of years

This is one of the places where you get some sense of time scales. In the rest of the paper they avoid this. They note in one of the figures: "Although the model is precise about tree topology and ordering of splits, it provides no information about population size changes or the timings of events." But the numbers above give time scales of foundings on the order of 1,000 years, with perhaps others at 3,000 years. Elsewhere they say:

Two features of the inferred history are of special interest. First, the ANI and CEU form a clade, and further analysis shows that the Adygei, a Caucasian group, are an outgroup. Many Indian and European groups speak Indo-European languages, whereas the Adygei speak a Northwest Caucasian language. It is tempting to assume that the population ancestral to ANI and CEU spoke 'Proto-Indo-European', which has been reconstructed as ancestral to both Sanskrit and European languages, although we cannot be certain without a date for ANI-ASI mixture.

Despite the hedge, the allusion here suggests a date pegged on the order of 4,000 years ago. We don't know much about how the Indo-Aryans arrived in India; the earliest extant records, the Vedas (which were transmitted orally initially), seem to be set in Northwest India. The general suspicion though is that the Indus Valley Civilization was not Indo-Aryan, and there is a Dravidian speaking population to the west of Pakistan, suggesting that that language group was at one point spoken in the region. All in all the outline being faintly sketched out in this paper sounds a lot like what Indians refer to as the Aryan invasion theory, a mass movement of populations out of the Northwest replacing and subjugating the natives. ANI values on the order of 70-80% in the Northwest seems to suggest near total replacement.

I'm skeptical. Obviously the Ind-Aryans had to arrive physically, but these sorts of nomadic populations tend to quickly dominate and culturally assimilate sedentarists. In the case of the Hungarians and Turks they even imposed their language upon the natives, with only marginal genetic impact. The paper itself points to the likelihood of a complex history of periodic, and perhaps continuous, gene flow. Two ancient populations mixing is what economists would term a "stylized fact," good enough to get some points across, but not to be confused for reality.

What about the idea of foundings and subsequent endogamy explaining the high Fst? 2,500 years ago Herodotus already reported that India was the most populous nation in the world (he did not know of China). It isn't as if the Indo-Aryans arrived in the New World, where the natives died off so that they could enter into a major demographic expansionary phase. That being said, India's population did grow over time as cultures pushed east with better tools (e.g., iron axes), and cut down the local forests. To really test drive this model you need more 132 individuals from 25 populations. You need a lot of data from many individuals on to get a more granular feel for the variation. Population expansions did occur in the east down to the Mughal period as land was reclaimed for agriculture. Much of eastern Bengal was settled relatively recently, within the last 500-1000 years. In some regions we do have a sense of what the demographic history was, so we could be able to predict patterns of Fst if the model of founding + endogamy is operative. Historically this may make sense for some groups, such as Brahmins, who migrated to various regions to provide specialized services and then became indigenized, but it seems unlikely as an explanation for the majority of castes and jatis. Many of the same dynamics at work in India were probably at work in the Middle East. And also in Europe, which went through a population crash and "bounce back" after the fall of the Roman Empire. They should have just struck with a tree without the timing....

John Hawks has a related post.

Citation: Reich D, Thangaraj K, Patterson N, Price AL, Singh L. 2009. Reconstructing Indian population history. Nature 461:489-494. doi:10.1038/nature08365

More like this

A new paper came out in Science this week, Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation, that's getting some media play. The second-to-last author is L. L. Cavalli-Sforza, and the general combination of means and ends on display in The History and Geography of…
About six months ago I had a post up on the Cape Coloureds of South Africa. As a reminder, the Cape Coloureds are a mixed-race population who are the plural majority in the southwestern Cape region of South Africa. Like the white Boers they are a mostly Afrikaans speaking population who are…
Why are brown people so many shades of brown? If you were raised in a South Asian family I'm sure that you've had to deal with the "color" issue somehow. This isn't a cultural blog, so I'm not going to go there, but I do think that the salience of complexion in South Asian culture makes this new…
The language families of Europe fall into a few broad categories. There are the Indo-European languages, which include the Romance, Germanic, Slavic and Celtic subgroups, along with Greek and Albanian. The Iranian languages and most of the languages of India are also Indo-European. Then there are…

Entry as a result of agriculture is another possibility, as happened in Europe and SE Asia. There's evidence of a western origin of the Indus Valley civilization, and that would add a lot of European ancestry across many levels of caste and tribe.

The Gujarat cluster is really surprising. Gujarat has many exotic tribal and foreign origin groups; but presumably few of these have migrated out. Many Gujarati Americans came from Africa; possibly there was some mixing there?

It's been theorized that some of the most ancient Middle Eastern populations were Dravidian speakers -- the Elamites in SE Iran.

Likewise, some early languages of Iran were supposedly Indic rather than Iranian, and the oldest attested Iranian literature is linguistically closer to Indic in language than they should be.

Can't remember more than that, but rest confident that we're dealing with a can of worms.

By John Emerson (not verified) on 24 Sep 2009 #permalink

Munda might be a mix of rice-growing Malay settlers and aboriginal Kusunda (Terai, Nepal, who avoid hoofed animals, eat tubers & fowl, and share many root-words with Papuan/Tasmanian/Andamaners). I think Kusunda/Sunda/Munda/Andaman derived from Khoesandawe (Akwa) expatriates from southern Rift/Okavango/Kalahari mega-lakes eastwards due to tectonics and So. Italian supervolcano ~40ka.

Science trumps political correctness yet again.

By Paul Jones (not verified) on 24 Sep 2009 #permalink

I'd rather go with what CO-AUTHORS of the study has to say

Times Of India
http://timesofindia.indiatimes.com/news/india/Aryan-Dravidian-divide-a-…

HYDERABAD: The great Indian divide along north-south lines now stands blurred. A pathbreaking study by Harvard and indigenous researchers on
ancestral Indian populations says there is a genetic relationship between all Indians and more importantly, the hitherto believed ``fact'' that Aryans and Dravidians signify the ancestry of north and south Indians might after all, be a myth.

``This paper rewrites history... there is no north-south divide,'' Lalji Singh, former director of the Centre for Cellular and Molecular Biology (CCMB) and a co-author of the study, said at a press conference here on Thursday.

Senior CCMB scientist Kumarasamy Thangarajan said there was no truth to the Aryan-Dravidian theory as they came hundreds or thousands of years after the ancestral north and south Indians had settled in India.

The study analysed 500,000 genetic markers across the genomes of 132 individuals from 25 diverse groups from 13 states. All the individuals were from six-language families and traditionally ``upper'' and ``lower'' castes and tribal groups. ``The genetics proves that castes grew directly out of tribe-like organizations during the formation of the Indian society,'' the study said. Thangarajan noted that it was impossible to distinguish between castes and tribes since their genetics proved they were not systematically different.

The study was conducted by CCMB scientists in collaboration with researchers at Harvard Medical School,
Harvard School of Public Health and the Broad Institute of Harvard and MIT. It reveals that the present-day Indian population is a mix of ancient north and south bearing the genomic contributions from two distinct ancestral populations - the Ancestral North Indian (ANI) and the Ancestral South Indian (ASI).

``The initial settlement took place 65,000 years ago in the Andamans and in ancient south India around the same time, which led to population growth in this part,'' said Thangarajan. He added, ``At a later stage, 40,000 years ago, the ancient north Indians emerged which in turn led to rise in numbers here. But at some point of time, the ancient north and the ancient south mixed, giving birth to a different set of population. And that is the population which exists now and there is a genetic relationship between the population within India.''

The study also helps understand why the incidence of genetic diseases among Indians is different from the rest of the world. Singh said that 70% of Indians were burdened with genetic disorders and the study could help answer why certain conditions restricted themselves to one population. For instance, breast cancer among Parsi women, motor neuron diseases among residents of Tirupati and Chittoor, or sickle cell anaemia among certain tribes in central India and the North-East can now be understood better, said researchers.

The researchers, who are now keen on exploring whether Eurasians descended from ANI, find in their study that ANIs are related to western Eurasians, while the ASIs do not share any similarity with any other population across the world. However, researchers said there was no scientific proof of whether Indians went to Europe first or the other way round.

Migratory route of Africans

Between 135,000 and 75,000 years ago, the East-African droughts shrunk the water volume of the lake Malawi by at least 95%, causing migration out of Africa. Which route did they take? Researchers say their study of the tribes of Andaman and Nicobar islands using complete mitochondrial DNA sequences and its comparison those of world populations has led to the theory of a ``southern coastal route'' of migration from East Africa through India.

This finding is against the prevailing view of a northern route of migration via Middle East, Europe, south-east Asia, Australia and then to India.

Rj, it is also known that the divide between the northern and southern states in the USA is a myth. you see, pennsylvania is closer to virginia, a southern state, than it is maine! rather, regions shift imperceptibly along lines of latitude, not into two broad categories.

in any case, i actually quoted the paper you know :-) a lot of what they say there is totally obfuscatory and no doubt playing to the prejudices of the indian audience. west eurasians are probably better representatives of "ancient north indians" than north indians themselves. so where does one assume those "ancient north indians" came from?

this is like brazilian scientists letting their audience think that portugal was settled from brazil. they know it's totally implausible, but i guess it's politic.

Equating colonial migrations with the IE expansion, and the north-south divide in India with North American states to deride opponents of an unscientific and outdated theory has the opposite effect in this case, if you think about it.

I seriously doubt that these scientists would go public just to satisfy an 'Indian audience' for political reasons. This was at a press conference and I'm positive that these scientists are aware that the days of pigeon post are over.

"The study also helps understand why the incidence of genetic diseases among Indians is different from the rest of the world. Singh said that 70% of Indians were burdened with genetic disorders and the study could help answer why certain conditions restricted themselves to one population."

Isn't the answer self-evident within the very same question? Relatively high rates of endogenous mating creates distinct population clusters. This however has the side effect of compounding recessive genetic ailments within their respective populations. Consanguineous marriage is individually relatively harmless, doing it for multiple generations spells trouble down the line.

Razib maybe you can clear up a little point of confusion from me. It is of my understanding that marriages between the same Varnas and clan groupings in India are discouraged as being considered incestuous. Yet how does this cultural taboo not preclude the reality of the high degree of first cousin and even uncle/niece parings? Is it simply matter sister's daughter Halal, brother's daughter Haram?

re: consanguineous marriages, they reject this as being able to explain the haplotype blocks. i am skeptical, but need to reread that section in the supplements before i comment :-)

. It is of my understanding that marriages between the same Varnas and clan groupings in India are discouraged as being considered incestuous.

north india and south india differ. the north indian hindus are exogamous, but south indian hindus excluding residents of kerala do cross-cousin and uncle-niece marriages.

Very interesting. I wouldn't have chased down this paper myself, but appreciate your breaking it down and sharing it.

Thanks for making it accessible.

The graphs of euro, indian, chinese were very helpful. I'm glad you included those.

i'm a lurker but the awesomeness of this blog (public service really) has forced me out. i have some questions (i have a really basic understanding of genetics but i'm interested by the caste findings as well as the sindhi, gujarati, and pathan findings (as best as i can understand them).

re: gujarati people...gurjar...gujjar. in (the undivided) punjab there is a caste known as gujjar (in the sense that jatt is a caste, gujjars and jatts are known to be rivals) who roughly inhabit (or are from)the pakistani areas in kashmir and northern parts of punjab - historically, they're animal herder's, and i've heard in kashmir, they still do this. is there any sort of link between gujarati's and them? for example, the two large area's they're from in pakistan are known as "gujranwala" (place of the gujjars) and "gujarat". they're quite dominant in pakistan (socially/politically/military etc), but is there a link between them and the gujarati people who actually speak gujarati? i know there's a similarity in language between hindi/urdu-punjabi-gujarati...what do the genetics say about that?

further, can we make any speculations about gujarati people and rajput's (or maybe its just so obvious to all the scientists involved that i didn't pick up on it). many sindhi's, gujrati's, punjabi's all claim (quite proudly too) rajput ancestry, makes sense to me since the rajput empire was the largest in india prior to the mughal invasions. and then, i've also heard theories that rajput's came from elsewhere as well, but i'm skeptical of this turning into a full blown "indo-scythian" fantasy since it sometimes appears as though anyone with any claim to previous indian "fame" has to be from "outside"... (ie - desi muslim groups and "syed" claims...not buying it).

also...no punjabi's or bengali's were included in the study (unless i missed it?) where would they fit?, they're such massive groups! i'm intrigued by the pathan data since there's always been an understanding that they're an outlier group in the indian subcontinent (atleast culturally and socially) as they're not as "indianized" as punjabi's or sindhi's etc. if i'm reading this right, pathans and sindhi people are quite close then, no? i've assumed that sindhi people have had a link with dravidians b/c of their mixture with brahui people, is this remotely correct, or does the data show anything about this? and finally, there's a region in pakistan, in between the pathan area (nwfp) and punjab, where there's some sort of hybrid punjabi/pashtun culture, the language is even a mix of the two, i believe its referred to as "hindko", i wonder if the genetic data would also be a mixture?

i realize this is all rambled...i'm not a scholar in this subject, all of my understanding is what i know through family histories and so on. i'm not articulate with this genetics stuff at all, so answer what you can if it doesn't require you to give me a basic primer on genetics.

Razib -- Is the paper's assumption of homogenous ANI/ASI ancestral populations tenable? Present day South Asia is extremely diverse (not just Munda, I-E, Dravidian, Siddi, but even Arab Yemenis, Red-headed pathans, and Chinese)! Is it possible to use this data in a more sophisticated model that acknowledges the possibility of more heterogenous ancestral populations?

Is it possible to use this data in a more sophisticated model that acknowledges the possibility of more heterogenous ancestral populations?

hm. well, i think the paper is addressing the primary axis of variation, which is NW-SE. as for the realism, i think the reality is that ANI probably arrived via a few successive migrations or pulses. the main which is the idea of clines and recurrent gene flow: why is ASI only in south asia?

Dear Razib, Your post is excellent, Thank you for simplifying the results.

Razib, I was wondering if you could explain a little about table S6, on mtDNA and Y Chromosome data, seems to suggest say high ASI in for eg. Vaish but table S4 based on Z Score suggests high ANI in them?

i've assumed that sindhi people have had a link with dravidians b/c of their mixture with brahui people, is this remotely correct, or does the data show anything about this

classical markers (e.g., blood groups, etc.), showed no difference between the brahui and baloch. if there was a genetic diff, likely it was eliminated by more than 3,000 years of gene flow.

Razib, I was wondering if you could explain a little about table S6, on mtDNA and Y Chromosome data, seems to suggest say high ASI in for eg. Vaish but table S4 based on Z Score suggests high ANI in them?

i saw that too.

1) the sample sizes were small for each group. 5-10.

2) so if you're sampling one locus (Y or mtDNA) you'll get a lot more noise than if you look at the total genome content. for general assessments autosomal is what you need to look at.

Razib, Sepia's giving me an error when I try to post my comment, so I'm just pasting it here. In my defense, at least a part of it is directed at you, and some bits are directed at your critics. Here goes:

I think I'd tend to believe like razib that a) sample sizes are too small and b) broad overarching conclusions are drawn based on only a few groups. That said, I'd tend to think of this process of mixing gene pools as being diffusive. Where you have no insurmountable geographical boundaries, there will be diffusion of genes. And that said, I'd kill for a Nature publication. However, the TOI claims seem to be at odds with any reasonable interpretation of the paper itself. Makes me wonder if the authors were just misquoted or deliberately tried to obfuscate facts to make their findings seem more palatable to the general Indian public.

I think the crucial point here is that there is no good reason that I see to believe that the 'ancestral Indian' gene pool should be tightly linked to India's present day borders.

Follow my argument closely: locate a geographical centre of North India. Let's say it's Delhi. Any distance you measure, should be the distance from your region to Delhi. Now you're sampling different regions across North India, and measuring the ANI% (same measure as used in the Nature paper) for a population, and then you're measuring that population's distance from Delhi. Plot the ANI % against distance from Delhi, and you'd end up with a flawed Gaussian distribution. Now I'd wager this distribution doesn't drop to zero even if your distance was as far away, as say, Turkmenistan. For that matter, I'd wager that this Gaussian would not peak at Delhi. In fact, from the paper, Pathans have the highest ANI ancestry and they're spread over India, Pakistan and Afghanistan, which suggests that our Gaussian peaks near India's north western borders.

At this point, razib, I need your help. I'm a little unclear on how the ANI% measure is picked. Is there any uniqueness to it? Is it well established that there are indeed two major ancestral populations in India?

Obviously, my above Gaussian distribution doesn't take into account caste differences and treats all geographical regions as though they were the same. I'll add the disclaimer that I'm not a biologist, I'm a physicist who tends to simplify the problem first and then add in the complexity one piece at a time.

My simple argument above allows for migrations, including the one Michael Wood talks about in the Story of India. It's a great documentary, and I do believe that it presents the consensus view of historians. Try reading Colin Renfrew's Archaeology and Language.

So a sensible interpretation of genetic studies would in no way contradict linguistic and archaeological evidence. I'd tend to put my money on genetic population studies, because it's the most quantifiable and relevant to an existing population. Archaeology and language are strongly correlated to each other and to population studies, but it's never anything close to a 1:1 correspondence. In most cases, as in this one, genetic, archaeological and linguistic evidence seem to point in the same direction.

I fail to see how any of this is pessimistic for India. Populations throughout history have intermingled, dominated each other and killed each other off, and to use crimes of the past to justify present hateful ideologies or make assertions about superiority of one 'race' over the other is unjustifiably stupid. There's no such thing as genetic purity, other than the fact that our ultimate origin is most likely 100% pure African.

By Swati (ex Meluhhan) (not verified) on 03 Oct 2009 #permalink

You probably saw my other comment/rant. I actually read your blog post now, and I see that we're more or less in agreement. Argh, it's frustrating that people read value judgements into scientific facts. I'm Indian, and it's just the prevailing cultural view that the invasion was a myth. I've never seen that backed by any evidence whatsoever.

I guess the most frustrating thing here is that the authors claims contradict their findings and conclusions from the paper! Do they have no scientific integrity? Are you interested in drafting a letter with me to Nature about their unfounded claims to the press?

By Swati (Ex Meluhhan) (not verified) on 03 Oct 2009 #permalink

A study with such a great proclamation sure had a small sample size. All of the recent studies with large sample sizes seems to contradict this one.

Most researchers (especially western) claim that indo-european must have originated from the west on the basis of "linguistics". It should also be noted that at the time Europeans were largely scattered and tribal compared to India and various tribes did not necessarily have contact with others.

There hasn't seem to ever been any large study done on the basis that India might actually be the home of Sanskrit. All major studies have been done on the pre-assumed basis that Sanskrit originated from outside of India.

Yet the evidence for such a theory is still shaky even centuries after Max Muller a missing link to connect central Asia or eastern Europe being the home of the indo-european language group has not been found.

Perhaps it is time to let egos and pre-assumptions aside to find out the truth? That perhaps the indo-european language group did originate in South Asia.

Very interesting. A few observations:

Most of the "upper" castes seem to be drawn from Uttar Pradesh and Andhra Pradesh. The South Indian "Upper" castes are restricted to Naidus and Vellamas of A.P, who are hardly representative. I'd have liked to see some T.N Brahmins and some upper castes from Kerala and Karnataka included. A.P is expected to have much more population mixing if one looks at skin color and feature similarity among castes.

Interestingly, U.P Vaish is listed as an "upper" caste and A.P Vaisya is listed as a "middle" caste, which seems illogical if origin rather than local prestige is the criterion.

It's hardly surprising that the Tharu tribals show less ASI influence than South Indian castes. Tharu look more akin to Tibetans than to Indian "tribals".

Can you explain to this interested non-biologist how one might possibly "date" the entry of the hypothetical ANI/ASI populations? Would that be reliable if current castes formed later than the entry/mixture of the ANI/ASI? What if there were multiple "ANI" populations that migrated into India at multiple times? Is there a way of figuring that out?

Can you explain to this interested non-biologist how one might possibly "date" the entry of the hypothetical ANI/ASI populations? Would that be reliable if current castes formed later than the entry/mixture of the ANI/ASI? What if there were multiple "ANI" populations that migrated into India at multiple times? Is there a way of figuring that out?

they tried to figure it (in the press) by pegging it to the last common ancestor of maternal lineages (mtDNA). so some of the researchers are claiming that the ANI had to arrive more than 10,000 years ago. i think this is wrong, because i suspect there was a lot of male-bias in the ANI. another way they might do it is look at recombination of the genome. probably could eliminate very recent admixture through that method, though i assume that there'd be diminishing returns into the future.

as for multiple migrations, etc., with a bigger more representative sample i assume you could figure out if more complex demographic models are better fits. if you can squeeze more juice out of more than the two-way admixture then likely there's something there. the lead author comes close to admitting that it is likely more complicated than this simple model.

re: tamil brahmins, see here.