85% of genetic variation is within groups...

...yes, true. On a typical single locus (on some loci, such as SLC24A5, most of the variation is between groups). But that doesn't mean that you can't use genetics to differentiate population clusters. Here are 938 individuals (the points) from 51 world populations (the color of the points) displayed on a figure with the two largest principle components of the variation.


From Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. Also see Lewontin's Fallacy.


More like this

I grabbed the "supplemental materials" available online for that paper, but it didn't have what I'd hoped.

I'd be really interested to see the components of the PC1 and PC2 vectors, if I'm thinking about this the right way. I think they're some eigenvectors in SNP space. What do the coefficients look like? For example, could you discard all but, say, the 1000 largest coefficients, and still get a similar plot? 100? 10?

Actually, I should read the paper (or at least the abstract) to see what the dimension of the space is...


on the order of 100 for continental level differences.
we can estimate that about 120 unselected SNPs or 20 highly selected SNPs can distinguish group CA from NA, AA from AS and AA from NA. A few hundred random SNPs are required to separate CA from AA, CA from AS and AS from NA, or about 40 highly selected loci. STRP loci are more powerful and have higher effective δ values because they have multiple alleles. Table 3 reveals that fewer than 100 random STRPs, or about 30 highly selected loci, can distinguish the major racial groups. As expected, differentiating Caucasians and Hispanic Americans, who are admixed but mostly of Caucasian ancestry, is more difficult and requires a few hundred random STRPs or about 50 highly selected loci. These results also indicate that many hundreds of markers or more would be required to accurately differentiate more closely related groups, for example populations within the same racial category.

the paper is from 2002. i think we can go lower than 20 since we know some more ancestrally informative loci, such as SLC24A5, that they didn't then....

Thanks, Razib.

Your mention of SLC24A5 is related to another thing I'd imagined myself fooling around with, given the raw data: remove all of the sites known to contribute to skin color, and diagonalize the matrix again. This might teach me more about the extent to which (in my best S. J. Gould impersonation) "The differences among the races are only skin deep."


I couldn't read the full article. It appears that North africans are included with all africans rather than with middle easterners. Is this true? Any more specifics on the dots would be appreciated as well. Like are the east asians that are scattered amongst the CS asians southeast asians close to greater India?

So the mozabites are the brown dot farthest from the cluster? You said, north africans aren't africans and the link you just gave me agrees, but the wikipedia link at the top says mozabites are with the africans.

The "supplemental materials" at the original link Razib gave has a spreadsheet showing the 51 groups and how they were categorized. Dunno which dot is "Mozabite," but it's categorized as "Middle Eastern," according to that spreadsheet.


So what? This is just collapsing variation due to successive bottlenecks. That's hardly interesting in terms of the actual "race" part of it - it just means genetics has a historical trajectory. This is true of boring, neutral variation by itself.