A new paper in PLoS, Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples:
Many association studies have been published looking for genetic variants contributing to a variety of human traits such as obesity, diabetes, and height. Because the frequency of genetic variants can differ across populations, it is important to have estimates of genetic ancestry in the individuals being studied. In this study, we were able to measure genetic ancestry in populations of mixed ancestry by genotyping pooled, rather than individual, DNA samples. This represents a rapid and inexpensive means for modeling genetic ancestry and thus could facilitate future association or population-genetic studies in populations of unknown ancestry for which whole-genome data do not already exist.
All fine & good. But I thought figure 4 was interesting. I've highlighted the part that I thought noteworthy:
The panel on the left shows that with 420 Ancestrally Informative Markers (AIMS) you can separate the Japanese and Chinese (from Beijing) pretty well. These are markers which exhibit a lot of between population variation. But note the last section: with 3,100 random SNPs you can generate the same separation. And remember that you'd need way fewer markers for African populations, since they have more diversity. In any case, I'll keep these numbers in mind when people ask me genetic distance related questions and I know that Fst numbers won't mean anything....
Citation: K. Chiang CW, Z. Gajdos ZK, Korn JM, Kuruvilla FG, Butler JL, et al. 2010 Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples. PLoS Genet 6(3): e1000866. doi:10.1371/journal.pgen.1000866
- Log in to post comments
I think that a length post summing up this kind of question, the difference between neutral markers (which Cavalli-Sforza tried to use) and significant difference (e.g. lactose intolerance), and the idea that groups are populations defined statistically by mixes of genes, rather than identities, and putting it in common sense language, would be valuable.
...with 3,100 random SNPs you can generate the same separation. And remember that you'd need way fewer markers for African populations, since they have more diversity.
Okay, here are my two questions Razib: If two bushmen have more diversity between them than a European and an East Asian, does that mean that the offspring of a Bushman and a European or an East Asian is less diverse at the genotype level than a full-blooded bushman? Also, I know that 'inbreeding' is not a healthy thing, but if anybody could do it with the least amount of problems would it be the bushmen?..
Razib,
Here's a theoretical analysis of resolving power as a function of FST, number of SNPs, etc.
http://infoproc.blogspot.com/2008/12/resolution-of-genetic-population.h…
With current technology any two European nationalities (or Chinese and Japanese) are easily distinguishable with random SNPs.
"For example, given a 100,000 marker array and a sample size of 1,000, then the BBP threshold for two equal subpopulations, each of size 500, is FST = .0001. An FST value of .001 will thus be trivial to detect. To put this into context, we note that a typical value of FST between human populations in Northern and Southern Europe is about .006 [15]. Thus, we predict: most large genetic datasets with human data will show some detectable population structure."
PS Thanks for that followup on lactose tolerance and height
Is there a reason why the AIMs seem to identify more Chinese individuals as Japanese than vice versa? Or is that just due to the sample size?
i wouldn't put too much stock in that, though i assume that the chinese population will be more diverse than the japanese (the main caveat being that the japanese are probably a recent admixture between yayoi, 3 parts, and jomon, 1 part).