Genes predict the ancestry of African-Americans

I'm in the middle of a longer post on a recent paper on the effects of genetics on gene expression differences in African-Americans, which has also been well-covered by p-ter at Gene Expression. I wanted to post this section separately to avoid detracting from the issues in that post.

This figure will not provide any big surprises for those who have been following developments in human genetics over the last five years - but it still provides a compelling illustration of the power of genetics to predict individual ancestry:


The figure shows the results obtained when the European, Nigerian and East Asian samples from HapMap and 100 African-American samples are clustered using principal component analysis based on data from ~600,000 genetic markers.

The European, Nigerian and East Asian samples form strong clusters that are extraordinarily well-separated, demonstrating how easy it is to distinguish between members of these groups with sufficiently large numbers of markers. However, it should be emphasised that adding additional populations from other parts of the world would fill most of the gaps between these clusters, since human geographic variation is largely continuous rather than discrete.

The most interesting aspect of the figure is the distribution of the African-Americans, who almost all fall on a remarkably clean line stretching between the European and Nigerian clusters, with some individuals being pulled towards the East Asian cluster (presumably due to admixture from South or East Asia some population not well-captured by HapMap, e.g. Native Americans). Using these data each individual's relative level of European and African ancestry can be estimated with high precision. After removing four outliers (the three samples falling furthest from the European-African line, and one individual falling very close to the European cluster) the European admixture levels of individual African-Americans fall between 1 and 62%, with an average of ~21% - a figure consistent with previous studies.

Subscribe to Genetic Future.

Alkes L. Price, Nick Patterson, Dustin C. Hancks, Simon Myers, David Reich, Vivian G. Cheung, Richard S. Spielman (2008). Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans PLoS Genetics, 4 (12) DOI: 10.1371/journal.pgen.1000294

More like this

I was planning to write a long article on this recent paper in PLoS Genetics, but p-ter at Gene Expression and G at Popgen ramblings have both covered the central message very well. So if you haven't read those articles, already, go and do so now - when you come back, I want to talk about the…
Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans: ariation in gene expression is a fundamental aspect of human phenotypic variation, and understanding how this variation is apportioned among human populations is an important aim. Previous studies have compared gene…
An ancestry informative marker set for determining continental origin: validation and extension using human genome diversity panels: Results In this study, genotypes from Human Genome Diversity Panel populations were used to further evaluate a 93 SNP AIM panel, a subset of the 128 AIMS set, for…
Comparing Genetic Ancestry and Self-Described Race in African Americans Born in the United States and in Africa (H/T Yann): Genetic association studies can be used to identify factors that may contribute to disparities in disease evident across different racial and ethnic populations. However, such…

I hate these contextualized cluster maps. They lead to such utterly sloppy thinking as we are starting to see on various part of the internet. . You would get the same exact clusters if you simply plotted the latitude against the longitude of the hometown of each sample. Which would tell us that Nigeria is not in Asia and Europe is not in Nigeria.

But when we plot alleles (or phenytypes or whatever) individuals steeped in race based thinking cannot help but to point out how this proves the validity of racial categories.

Hi Dan,
You raised a good point in your comment on p-ters post at Gene Expression (I tried leaving this reply there, but it didn't post for some reason). The authors try to address this point by looking at whether known cis and trans eQTLs differ in their fst distribution. For example,. if trans eQTLs tended to have higher fst this would lead to a more variation being explained by trans ancestry. They find no strong difference. Although this does not rule out the possibility that there are some as yet undetected highly differentiated pleiotropic trans eQTLs causing this result.


Or just as easily some European ancestry not well captured by the CEU, e.g. southern Europe.

Mike, razib, G,

Yep, I didn't think that through at all - fixed.


You would get the same exact clusters if you simply plotted the latitude against the longitude of the hometown of each sample.

Sure, that's the whole point: that you can reconstruct geographic ancestry to a remarkable extent using genetic data. That message is unambiguously clear from the literature.

I do agree, though, that the use of distinct populations from each region is unfortunate from the point of view of encouraging the notion of discrete, non-overlapping races - but I hope that the point in the text about other populations filling in the gaps will help to clarify that this notion is largely false (i.e. that most variation is clinal).

I'm confused. Why do you use the word "predict" here? A person's ancestry isn't something that's going to happen in the future. Do you mean "indicate"?

Hi Martin,

Predict as in "use information from one source to make an estimate of a currently unmeasured variable". Scientific predictions don't have to be about future events; I understand even archaeologists make them. ;-)