How many SNPs to distinguish Japanese & Chinese?

A new paper in PLoS, Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples:

Many association studies have been published looking for genetic variants contributing to a variety of human traits such as obesity, diabetes, and height. Because the frequency of genetic variants can differ across populations, it is important to have estimates of genetic ancestry in the individuals being studied. In this study, we were able to measure genetic ancestry in populations of mixed ancestry by genotyping pooled, rather than individual, DNA samples. This represents a rapid and inexpensive means for modeling genetic ancestry and thus could facilitate future association or population-genetic studies in populations of unknown ancestry for which whole-genome data do not already exist.

All fine & good. But I thought figure 4 was interesting. I've highlighted the part that I thought noteworthy:

i-fe6dacc467955bebcb9426d7d42e13f2-chinjap.png

The panel on the left shows that with 420 Ancestrally Informative Markers (AIMS) you can separate the Japanese and Chinese (from Beijing) pretty well. These are markers which exhibit a lot of between population variation. But note the last section: with 3,100 random SNPs you can generate the same separation. And remember that you'd need way fewer markers for African populations, since they have more diversity. In any case, I'll keep these numbers in mind when people ask me genetic distance related questions and I know that Fst numbers won't mean anything....

Citation: K. Chiang CW, Z. Gajdos ZK, Korn JM, Kuruvilla FG, Butler JL, et al. 2010 Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples. PLoS Genet 6(3): e1000866. doi:10.1371/journal.pgen.1000866

More like this

Genome-wide Insights into the Patterns and Determinants of Fine-Scale Population Structure in Humans: Studying genomic patterns of human population structure provides important insights into human evolutionary history and the relationship among populations, and it has significant practical…
I referenced a paper in PNAS yesterday, and I thought it might be good to actually point to it today. There's nothing that new in the paper. It confirms the finding that ~20% of the ancestry of African Americans is European, and, that African ancestry seems to be much more dominant when it comes to…
This is a profoundly impressive paper - a study of the patterns of genetic variation in 2,400 individuals from 113 African populations, by far the most comprehensive analysis of African genetic diversity ever performed. I had heard that Sarah Tishkoff's group had assembled a large collection of…
As you know lactase persistence (LP), which confers the ability to digest lactose sugar as an adult, is an evolutionarily recent development. On the order of 1/3 of the human population exhibits LP, due to a variety of genetic mutations which seem to arise in the cultural background of the…

I think that a length post summing up this kind of question, the difference between neutral markers (which Cavalli-Sforza tried to use) and significant difference (e.g. lactose intolerance), and the idea that groups are populations defined statistically by mixes of genes, rather than identities, and putting it in common sense language, would be valuable.

By John Emerson (not verified) on 05 Mar 2010 #permalink

...with 3,100 random SNPs you can generate the same separation. And remember that you'd need way fewer markers for African populations, since they have more diversity.

Okay, here are my two questions Razib: If two bushmen have more diversity between them than a European and an East Asian, does that mean that the offspring of a Bushman and a European or an East Asian is less diverse at the genotype level than a full-blooded bushman? Also, I know that 'inbreeding' is not a healthy thing, but if anybody could do it with the least amount of problems would it be the bushmen?..

By Insightful (not verified) on 05 Mar 2010 #permalink

Razib,

Here's a theoretical analysis of resolving power as a function of FST, number of SNPs, etc.

http://infoproc.blogspot.com/2008/12/resolution-of-genetic-population.h…

With current technology any two European nationalities (or Chinese and Japanese) are easily distinguishable with random SNPs.

"For example, given a 100,000 marker array and a sample size of 1,000, then the BBP threshold for two equal subpopulations, each of size 500, is FST = .0001. An FST value of .001 will thus be trivial to detect. To put this into context, we note that a typical value of FST between human populations in Northern and Southern Europe is about .006 [15]. Thus, we predict: most large genetic datasets with human data will show some detectable population structure."

PS Thanks for that followup on lactose tolerance and height

Is there a reason why the AIMs seem to identify more Chinese individuals as Japanese than vice versa? Or is that just due to the sample size?

i wouldn't put too much stock in that, though i assume that the chinese population will be more diverse than the japanese (the main caveat being that the japanese are probably a recent admixture between yayoi, 3 parts, and jomon, 1 part).