93 ancestrally informative markers to categorize them all

By razib on July 30, 2009.

An ancestry informative marker set for determining continental origin: validation
and extension using human genome diversity panels:

Results
In this study, genotypes from Human Genome Diversity Panel populations were used to further evaluate a 93 SNP AIM panel, a subset of the 128 AIMS set, for distinguishing continental origins. Using both model-based and relatively model-independent methods, we here confirm the ability of this AIM set to distinguish diverse population groups that were not previously evaluated. This study included multiple population groups from Oceana, South Asia, East Asia, Sub-Saharan Africa, North and South America, and Europe. In addition, the 93 AIM set provides population substructure information that can, for example, distinguish Arab and Ashkenazi from Northern European population groups and Pygmy from other Sub-Saharan African population groups.

Conclusion
These data provide additional support for using the 93 AIM set to efficiently identify continental subject groups for genetic studies, to identify study population outliers, and to control for admixture in association studies.

AIM = ancestrally informative markers. You are probably aware of the fact that most variance on any given gene is found within populations, and not between. Therefore, the chestnut of conventional wisdom that 85% of variance is within races, and 15% is between races. But not all genes are created equal. For example, on SLC24A5 almost all the variance among Europeans and Africans is between the races; if you know the state of SLC24A5, then you can establish with a high degree of certainty whether the person is African or European in origin if these are your only two options (Asians and Africans cluster on SLC245, though if you find the "European" variant you can be assured of an individual's provenance, at least partially, from North Africa or Western Eurasia). The logic then is that a small number of highly population informative markers (i.e., those markers which are good at distinguishing between populations) can allow one to discern population stratification within medical studies. If, for example, you are looking for disease susceptibility alleles and different populations have different disease susceptibilities, then naturally those alleles which are correlated with particular populations will show up on an association (though the "causal" connection is population identity in terms of both disease and allele). This is why Ashkenazi Jewish genetics are of more than genealogical interest, if Jews have a unique suite of genetic diseases (this is true) then it might best to exclude them from studies using other Europeans. Sniffing out of this sort of "cryptic" structure isn't that hard, in the early 2000s Neil Risch et al. pointed out that as few as 20 AIMs may be sufficient to distinguish continental populations.

This study uses 93 markers to distinguish HGDP groups, along with a few other supplemental populations which were not well represented in HGDP sample. For example, since the government of India was rather restrictive of genetic research when the HGDP population samples were being collected the "South Asians" are generally from Pakistan. A study which surveyed Indian Americans (that is, Americans whose family are of Indian origin) provided the data to "plug" that whole. Clusters were displayed through two primary methods, Structure and principal component analysis charts.

(I've reformatted this figure a bit)

These figures aren't too special, you've seen better. But, instead of tens thousands of SNPs these are just 93 markers. So the bang-for-the-buck is rather big. Both the Structure and PC charts aligned with intuition and previous findings. In fact they compared their results to those of 3,500 random SNPs (remember, randomly selected markers will show a lot less between population variance, so less bang-for-the-buck).

The r² is the square of the correlation, and describes how much of the variation of Y can be explained by variation of X. As you can see the 93 AIMs don't do too badly when judged against 3,500 random markers. This check is necessary because local adaptation can give a distorted impression of total genome content if selection is driving allele frequencies toward convergence among disparate populations. Consider if LCT, the locus which controls lactase persistence, were used. It seems as if there is a fair amount of ecologically driven variance at odds with the rest of the genome on some genes which exhibit a lot of between population variance, so one must be aware of this problem.

But there's a big exception: the 93 AIMs don't seem too good at classifying the South Asians as a distinct group when set against more markers. In other words, these 93 AIMs don't seem too "ancestrally informative" when it comes to brown folk. At K = 5 (five putative ancestral populations) the admixture of South Asians is rather evident in the Structure chart. They note that using a 85% cut-off for ancestry within a "South Asian" cluster results in in only 25% of South Asians in their own category (at K = 6). Dropping to 50% increases the proportion to 60%. South Asians should of course be excluded from studies which are majority European because there are clear genetic differences. I just assume that part of the issue is that these ancestrally informative markers were selected in the context of a much larger literature which relied on Europeans, African Americans and East Asians (combined with the fact that South Asians are closer to Europeans & Middle Easterners than other population clusters, but still distinctive).

Cite: BMC Genetics 2009, 10:39 doi:10.1186/1471-2156-10-39

Related: Genetic maps.

More like this

Plants can be bar coded. People too I guess.
http://www.redorbit.com/news/science/1728802/botanists_agree_on_plant_d…

Whatever you do, don't tell Greg Laden about this.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

Remember to switch RSS feeds

April 3, 2010

If you link to this weblog from your weblog, please update links: http://blogs.discovermagazine.com/gnxp/ If you have not updated your feeds, please do so now: http://feeds.feedburner.com/GeneExpressionBlog The old feed address will point for another week or so to the new feed, but eventually it…

I'm moving to Discover

March 26, 2010

Update your bookmarks: http://blogs.discovermagazine.com/gnxp And RSS: http://feeds.feedburner.com/GeneExpressionBlog If you have a weblog that links to ScienceBlogs GNXP, I would appreciate you update the link for the sake of PageRank. There isn't much to say about the move. There wasn't one big…

Canada is not a "free society"

March 24, 2010

That's all I have to say to Eric Michael Johnson's post, Ann Coulter, Hate Speech, and Free Societies. OK, seriously, from what I recall Eric is an American, though resident in the forgotten north. American absolutist stances on free speech are not shared by most Western societies, so demanding…

Others in Siberia

March 24, 2010

The complete mitochondrial DNA genome of an unknown hominin from southern Siberia: With the exception of Neanderthals, from which DNA sequences of numerous individuals have now been determined...the number and genetic relationships of other hominin lineages are largely unknown. Here we report a…

The biophysical limits of cognitive computation

March 23, 2010

In this diavlog with Glenn Loury the behavioral economist Sendhil Mullainathan recounts the results of an experiment. - If given the option of paying $100 for an item vs. $80 for an item, but in the second case having to go across town for the item, respondents choose $80 and going across town - If…