Genes predict the ancestry of African-Americans

By dgmacarthur on December 6, 2008.

I'm in the middle of a longer post on a recent paper on the effects of genetics on gene expression differences in African-Americans, which has also been well-covered by p-ter at Gene Expression. I wanted to post this section separately to avoid detracting from the issues in that post.

This figure will not provide any big surprises for those who have been following developments in human genetics over the last five years - but it still provides a compelling illustration of the power of genetics to predict individual ancestry:

The figure shows the results obtained when the European, Nigerian and East Asian samples from HapMap and 100 African-American samples are clustered using principal component analysis based on data from ~600,000 genetic markers.

The European, Nigerian and East Asian samples form strong clusters that are extraordinarily well-separated, demonstrating how easy it is to distinguish between members of these groups with sufficiently large numbers of markers. However, it should be emphasised that adding additional populations from other parts of the world would fill most of the gaps between these clusters, since human geographic variation is largely continuous rather than discrete.

The most interesting aspect of the figure is the distribution of the African-Americans, who almost all fall on a remarkably clean line stretching between the European and Nigerian clusters, with some individuals being pulled towards the East Asian cluster (presumably due to admixture from ~~South or East Asia~~ some population not well-captured by HapMap, e.g. Native Americans). Using these data each individual's relative level of European and African ancestry can be estimated with high precision. After removing four outliers (the three samples falling furthest from the European-African line, and one individual falling very close to the European cluster) the European admixture levels of individual African-Americans fall between 1 and 62%, with an average of ~21% - a figure consistent with previous studies.

Subscribe to Genetic Future.

Alkes L. Price, Nick Patterson, Dustin C. Hancks, Simon Myers, David Reich, Vivian G. Cheung, Richard S. Spielman (2008). Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans PLoS Genetics, 4 (12) DOI: 10.1371/journal.pgen.1000294

More like this

I hate these contextualized cluster maps. They lead to such utterly sloppy thinking as we are starting to see on various part of the internet. . You would get the same exact clusters if you simply plotted the latitude against the longitude of the hometown of each sample. Which would tell us that Nigeria is not in Asia and Europe is not in Nigeria.

But when we plot alleles (or phenytypes or whatever) individuals steeped in race based thinking cannot help but to point out how this proves the validity of racial categories.

Hi Dan,
You raised a good point in your comment on p-ters post at Gene Expression (I tried leaving this reply there, but it didn't post for some reason). The authors try to address this point by looking at whether known cis and trans eQTLs differ in their fst distribution. For example,. if trans eQTLs tended to have higher fst this would lead to a more variation being explained by trans ancestry. They find no strong difference. Although this does not rule out the possibility that there are some as yet undetected highly differentiated pleiotropic trans eQTLs causing this result.

Mightn't the East Asian pull be due to Native American admixture?

my thought was the same as mike's.

Or just as easily some European ancestry not well captured by the CEU, e.g. southern Europe.

Mike, razib, G,

Yep, I didn't think that through at all - fixed.

Greg,

You would get the same exact clusters if you simply plotted the latitude against the longitude of the hometown of each sample.

Sure, that's the whole point: that you can reconstruct geographic ancestry to a remarkable extent using genetic data. That message is unambiguously clear from the literature.

I do agree, though, that the use of distinct populations from each region is unfortunate from the point of view of encouraging the notion of discrete, non-overlapping races - but I hope that the point in the text about other populations filling in the gaps will help to clarify that this notion is largely false (i.e. that most variation is clinal).

I'm confused. Why do you use the word "predict" here? A person's ancestry isn't something that's going to happen in the future. Do you mean "indicate"?

Hi Martin,

Predict as in "use information from one source to make an estimate of a currently unmeasured variable". Scientific predictions don't have to be about future events; I understand even archaeologists make them. ;-)

Heh, scientific postdictions?

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…