How many SNPs to distinguish Japanese & Chinese?

A new paper in PLoS, Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples:

Many association studies have been published looking for genetic variants contributing to a variety of human traits such as obesity, diabetes, and height. Because the frequency of genetic variants can differ across populations, it is important to have estimates of genetic ancestry in the individuals being studied. In this study, we were able to measure genetic ancestry in populations of mixed ancestry by genotyping pooled, rather than individual, DNA samples. This represents a rapid and inexpensive means for modeling genetic ancestry and thus could facilitate future association or population-genetic studies in populations of unknown ancestry for which whole-genome data do not already exist.

All fine & good. But I thought figure 4 was interesting. I've highlighted the part that I thought noteworthy:

i-fe6dacc467955bebcb9426d7d42e13f2-chinjap.png

The panel on the left shows that with 420 Ancestrally Informative Markers (AIMS) you can separate the Japanese and Chinese (from Beijing) pretty well. These are markers which exhibit a lot of between population variation. But note the last section: with 3,100 random SNPs you can generate the same separation. And remember that you'd need way fewer markers for African populations, since they have more diversity. In any case, I'll keep these numbers in mind when people ask me genetic distance related questions and I know that Fst numbers won't mean anything....

Citation: K. Chiang CW, Z. Gajdos ZK, Korn JM, Kuruvilla FG, Butler JL, et al. 2010 Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples. PLoS Genet 6(3): e1000866. doi:10.1371/journal.pgen.1000866

More like this

Genome-Wide Association Study in Asian Populations Identifies Variants in ETS1 and WDFY4 Associated with Systemic Lupus Erythematosus: In this study, we first conducted a genome-wide association study in a Hong Kong Chinese population, followed by replication in three other cohorts from Mainland…
It is well known that different ethnic groups vary when it comes to diseases such as Type II Diabetes. Or, more specifically they vary in terms of risk, all things equal (if you use an online Type II Diabetes calculator you'll see immediately as they sometimes have a parameter for ethnicity).…
A few days ago I pointed to a paper which suggests the possible utility of looking at selection on standing genetic variation on quantitative traits to get a sense of the role of adaptation in the human genome. We humans like to think we're a complex species, so I see no a priori reason why our…
I was planning to write a long article on this recent paper in PLoS Genetics, but p-ter at Gene Expression and G at Popgen ramblings have both covered the central message very well. So if you haven't read those articles, already, go and do so now - when you come back, I want to talk about the…

I think that a length post summing up this kind of question, the difference between neutral markers (which Cavalli-Sforza tried to use) and significant difference (e.g. lactose intolerance), and the idea that groups are populations defined statistically by mixes of genes, rather than identities, and putting it in common sense language, would be valuable.

By John Emerson (not verified) on 05 Mar 2010 #permalink

...with 3,100 random SNPs you can generate the same separation. And remember that you'd need way fewer markers for African populations, since they have more diversity.

Okay, here are my two questions Razib: If two bushmen have more diversity between them than a European and an East Asian, does that mean that the offspring of a Bushman and a European or an East Asian is less diverse at the genotype level than a full-blooded bushman? Also, I know that 'inbreeding' is not a healthy thing, but if anybody could do it with the least amount of problems would it be the bushmen?..

By Insightful (not verified) on 05 Mar 2010 #permalink

Razib,

Here's a theoretical analysis of resolving power as a function of FST, number of SNPs, etc.

http://infoproc.blogspot.com/2008/12/resolution-of-genetic-population.h…

With current technology any two European nationalities (or Chinese and Japanese) are easily distinguishable with random SNPs.

"For example, given a 100,000 marker array and a sample size of 1,000, then the BBP threshold for two equal subpopulations, each of size 500, is FST = .0001. An FST value of .001 will thus be trivial to detect. To put this into context, we note that a typical value of FST between human populations in Northern and Southern Europe is about .006 [15]. Thus, we predict: most large genetic datasets with human data will show some detectable population structure."

PS Thanks for that followup on lactose tolerance and height

Is there a reason why the AIMs seem to identify more Chinese individuals as Japanese than vice versa? Or is that just due to the sample size?

i wouldn't put too much stock in that, though i assume that the chinese population will be more diverse than the japanese (the main caveat being that the japanese are probably a recent admixture between yayoi, 3 parts, and jomon, 1 part).