Gene Expression

Dienekes points me to a new paper, European Population Genetic Substructure: Further Definition of Ancestry Informative Markers for Distinguishing Among Diverse European Ethnic Groups. You’ve seen this song & dance before:

Population substructure in Japan
Population substructure of Mexican Mestizos
European population substructure
Genetic Map of East Asia
The genetics of Fenno-Scandinavia
Finns as European outliers
Uyghurs are hybrids
Genetic structure of Eastern European populations
Genetic map of Europe; genes vary as a function of distance
More genetic maps of Europe
Human population structure, part n
How Ashkenazi Jewish are you?

There’s more in the archives. The ostensible reasoning behind all this phylogeography is that in medical studies you want to control for population structure; in other words, throwing a bunch of people together in a big bag labeled “white” may not be optimal. One group of whites, Ashkenazi Jews, exhibit a whole lot of illnesses which are specific to that group. Labeling Jews as whites and using studies which used a non-Jewish populations as informative for Jews can cause obvious issues. The problem likely extends to other groups (e.g., the Quebecois). This particular paper was funded by the NIH, so there has to be a rationale which leads to some sort of human well being medically at the end of the tunnel, at least on paper.

Of course the other reason that these sorts of papers are of interest is that people are fixated upon human genetic relationships. A friend of mine was at a bar the other night and a woman purported to correct him about the nature of the Out-of-Africa movement (from what I can gather she was confused, but that’s another issue). These sorts of studies drill-down to a further level, and so naturally arouse curiosity. Some of the results and interpretations can be fraught because of nationalism; there has been a great deal of controversy and political import over whether modern Greeks are simply Hellenized Slavs, or not. Genetic data can help resolve some of these issues, though perhaps not everyone is looking for a resolution.

Obviously these sorts of papers have returned most of the more important low hanging fruit, but since people have this deep interest in population relationships I assume that there’ll continue to be publications fleshing out the empty spaces on he map, as well as minutiae on the margins. In the discussion the authors observe something which is important:

Furthermore, the inclusion or exclusion of particular ethnic groups (i.e. Ashkenazi Jewish, and Sardinian for southern European, and Orcadian for Northern European) shifted the relationships in PCA when southern or northern Europeans were examined separately. Similarly, the inclusion of South Asian populations…changes the relationships of the population groups with the Ashkenazi Jewish population appearing in the center of a presumed southern European cline. These findings are consistent with our previous observations…and show that PCA results are highly dependent on which population groups are included in the analysis. Thus, there should be some caution in interpreting these results and other results from similar analytic methods with respect to ascribing origins of particular ethnic groups.

PCA, principle component analysis, which can be easily displayed in the form of charts, is an abstraction. Often very useful, but still an abstraction whose black-box use can sometimes offer a false sense of certainty. This is obvious when you look at the “methods” sections of these sorts of papers where they take great care to filter the data which goes into the analysis appropriately so as not to generate junk.

I’ve labelled Figure 1 panel A for clarity. I assume some of the outliers are due to immigrants, which I didn’t see noted as being removed as are in the case in many of the other studies (someone who read closer than I can correct, or who knows the original data sets used). In this case the light yellow-green dots are European Americans. European Americans span a broader range on this chart, some of it being due to selection of only a small number of European, Middle Eastern and South Asian populations, but also possibly because of “novel mixes” (e.g., Swedish + Basque) which are relatively rare in Europe. I changed the “Adygei” to what most of you might be more familiar with as their name historically, “Circassian.”


As an example of how populations selected might matter how you view the relationships, here are the “Southern European” (which includes Middle Eastern groups) panels which exclude Sardinians (islands are genetically weird) and exclude Jews in the second one:


Cite: European Population Genetic Substructure: Further Definition of Ancestry Informative Markers for Distinguishing Among Diverse European Ethnic Groups, Molecular Medicine, 2009 August 24, doi: 10.2119/molmed.2009.00094