Genetic Future

I’m in the middle of a longer post on a recent paper on the effects of genetics on gene expression differences in African-Americans, which has also been well-covered by p-ter at Gene Expression. I wanted to post this section separately to avoid detracting from the issues in that post.

This figure will not provide any big surprises for those who have been following developments in human genetics over the last five years – but it still provides a compelling illustration of the power of genetics to predict individual ancestry:


The figure shows the results obtained when the European, Nigerian and East Asian samples from HapMap and 100 African-American samples are clustered using principal component analysis based on data from ~600,000 genetic markers.

The European, Nigerian and East Asian samples form strong clusters that are extraordinarily well-separated, demonstrating how easy it is to distinguish between members of these groups with sufficiently large numbers of markers. However, it should be emphasised that adding additional populations from other parts of the world would fill most of the gaps between these clusters, since human geographic variation is largely continuous rather than discrete.

The most interesting aspect of the figure is the distribution of the African-Americans, who almost all fall on a remarkably clean line stretching between the European and Nigerian clusters, with some individuals being pulled towards the East Asian cluster (presumably due to admixture from South or East Asia some population not well-captured by HapMap, e.g. Native Americans). Using these data each individual’s relative level of European and African ancestry can be estimated with high precision. After removing four outliers (the three samples falling furthest from the European-African line, and one individual falling very close to the European cluster) the European admixture levels of individual African-Americans fall between 1 and 62%, with an average of ~21% – a figure consistent with previous studies.

Subscribe to Genetic Future.

Alkes L. Price, Nick Patterson, Dustin C. Hancks, Simon Myers, David Reich, Vivian G. Cheung, Richard S. Spielman (2008). Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans PLoS Genetics, 4 (12) DOI: 10.1371/journal.pgen.1000294


  1. #1 Greg Laden
    December 6, 2008

    I hate these contextualized cluster maps. They lead to such utterly sloppy thinking as we are starting to see on various part of the internet. . You would get the same exact clusters if you simply plotted the latitude against the longitude of the hometown of each sample. Which would tell us that Nigeria is not in Asia and Europe is not in Nigeria.

    But when we plot alleles (or phenytypes or whatever) individuals steeped in race based thinking cannot help but to point out how this proves the validity of racial categories.

  2. #2 G
    December 6, 2008

    Hi Dan,
    You raised a good point in your comment on p-ters post at Gene Expression (I tried leaving this reply there, but it didn’t post for some reason). The authors try to address this point by looking at whether known cis and trans eQTLs differ in their fst distribution. For example,. if trans eQTLs tended to have higher fst this would lead to a more variation being explained by trans ancestry. They find no strong difference. Although this does not rule out the possibility that there are some as yet undetected highly differentiated pleiotropic trans eQTLs causing this result.


  3. #3 Mike Keesey
    December 6, 2008

    Mightn’t the East Asian pull be due to Native American admixture?

  4. #4 razib
    December 6, 2008

    my thought was the same as mike’s.

  5. #5 G
    December 7, 2008

    Or just as easily some European ancestry not well captured by the CEU, e.g. southern Europe.

  6. #6 Daniel MacArthur
    December 7, 2008

    Mike, razib, G,

    Yep, I didn’t think that through at all – fixed.


    You would get the same exact clusters if you simply plotted the latitude against the longitude of the hometown of each sample.

    Sure, that’s the whole point: that you can reconstruct geographic ancestry to a remarkable extent using genetic data. That message is unambiguously clear from the literature.

    I do agree, though, that the use of distinct populations from each region is unfortunate from the point of view of encouraging the notion of discrete, non-overlapping races – but I hope that the point in the text about other populations filling in the gaps will help to clarify that this notion is largely false (i.e. that most variation is clinal).

  7. #7 Martin R
    December 7, 2008

    I’m confused. Why do you use the word “predict” here? A person’s ancestry isn’t something that’s going to happen in the future. Do you mean “indicate”?

  8. #8 Daniel MacArthur
    December 7, 2008

    Hi Martin,

    Predict as in “use information from one source to make an estimate of a currently unmeasured variable”. Scientific predictions don’t have to be about future events; I understand even archaeologists make them. 😉

  9. #9 Mike Keesey
    December 8, 2008

    Heh, scientific postdictions?

New comments have been disabled.