More genetic maps of Europe

Another paper on European phyogeography, Investigation of the fine structure of European populations with applications to disease association studies:

An investigation into fine-scale European population structure was carried out using high-density genetic variation on nearly 6000 individuals originating from across Europe. The individuals were collected as control samples and were genotyped with more than 300 000 SNPs in genome-wide association studies using the Illumina Infinium platform. A major East-West gradient from Russian (Moscow) samples to Spanish samples was identified as the first principal component (PC) of the genetic diversity. The second PC identified a North-South gradient from Norway and Sweden to Romania and Spain...The next 18 PCs also accounted for a significant proportion of genetic diversity observed in the sample. We present a method to predict the ethnic origin of samples by comparing the sample genotypes with those from a reference set of samples of known origin. These predictions can be performed using just summary information on the known samples, and individual genotype data are not required. We discuss issues raised by these data and analyses for association studies including the matching of case-only cohorts to appropriate pre-collected control samples for genome-wide association studies.

Below the fold is a PC map where I've added clarifying labels.


But there's another fascinating map which I think needs to be highlighted.


As you can see, there is population substructure within Germany. We need to map these things out now before the EU mobility wipes clear many of the historical information signals. These data will be a great boon to historians trying to understand the patterns of ethnogenesis in pre-modern Europe. There are textual support for the contention that the medieval German drive to the east resulted in the Germanization of many West Slavic peoples, these sorts of data can test these hypotheses.

Finally, I'll leave you with this comment from Dienekes:

Once again, it is clear that members of particular nations can mostly be mistaken for members of their closest neighbors. Almost all Spaniards are guessed as French; French mostly as Belgians but with sizable Spanish and UK minorities; UK as Belgians but with sizable French minorities; Norwegians mostly as Swedes but with some UK; Swedes mainly as Germans but many as Norwegians; most Poles as Russians, but some Slovaks or Czechs, and so on.

The importance of these results can't be underestimated. While it can be argued that some ethnic groups are spuriously distinctive only due to insufficient sampling of the geographical continuum, it is more difficult to do this for others. For example, it is now possible to identify particular ethnic groups, e.g., Norwegians, with great accuracy from DNA.

More markers and more populations will doubtlessly enhance our ability to distinguish European nations using DNA. But perfect accuracy is unlikely; in most European nations there are probably minorities which -for historical reasons- allied themselves with one country or political entity even though they were ultimately of different genetic background than the majority population of that entity.

Nonetheless, at a time when -due to a sort of mental hysteresis- proclamations that "races are social constructs" are still routinely made, the discovery that not only races, but even closely related ethnic groups (e.g. Norwegians and Swedes) can be distinguished with greater than 90% accuracy, serves to illustrate the scientific irrelevance of the ethnic nihilists and the affirmation that nations are, at least in part, genetic entities.

If you read the paper you can see that the differences between European nationalities are very small, especially in a worldwide context. But that does not mean that they are trivial when it comes to giving us information about population structure.

Related: The Genetic Map of Europe, Genetic map of Europe; genes vary as a function of distance and Finns as European outliers.


More like this

A few months ago I relayed preliminary data which suggested that Estonians are not like Finns. Now a new paper, Genetic Structure of Europeans: A View from the North-East: Using principal component (PC) analysis, we studied the genetic constitution of 3,112 individuals from Europe as portrayed by…
I referenced a paper in PNAS yesterday, and I thought it might be good to actually point to it today. There's nothing that new in the paper. It confirms the finding that ~20% of the ancestry of African Americans is European, and, that African ancestry seems to be much more dominant when it comes to…
The state of China has 1/5 of humanity within its borders, so it's genetic structure is of interest. It is obviously important for medical reasons to clarify issues of population structure so that disease susceptibility among the Han is well characterized, in particular with the heightened medical…
Dienekes & John Hawks have already blogged a new paper, Geographical structure and differential natural selection amongst North European populations: Population structure can provide novel insight into the human past and recognizing and correcting for such stratification is a practical concern…

This is fascinating stuff.
I find it strange that people can read Lewontins data that 85% of human genetic variation is within human groups and somehow think this means we should ignore the remaining 10-15% genetic differences as being inconsequential. It probably is inconsequential for most of the things they worry about (IQ, ability etc) but its certainly not inconsequential for historical or medical studies.

That's cool! I would love to find out where I am 'from' in Europe. I'm supposedly 80% English with maybe about 20% Dutch (16-17thC immigration to East Anglia), but easily mistaken for a Spaniard. Go figure... I admit I'm quite surprised, with European history being what it is, that national types have proved so persistent.

What happened to the Italians, by the way?

variation is within human groups and somehow think this means we should ignore the remaining 10-15% genetic differences as being inconsequential.

this is a very small part of the less than 10-15%. but it just goes to show how much structure there is in genetic variation when you look at a lot of genes. the impact of geographic distance is pretty clear.

What happened to the Italians, by the way?

this study didn't have that group. but click the "related" and you'll see them.

and there's a of variation in what british people look like. almost certainly you are what you are reputed to be, but your family is just on the dark side of the normal range.

Arrgh. Geneticists are such teases. When are we going to get a study like this which includes Finns and a representative sample of other northeastern Europeans? The Swedes floating off to the right look like they could be part-Finns, though...

The blue blob near Dresden could be the Sorbian population. Wikipedia says that the name Dresden is from Upper Sorbian, a Slavic language. The Lower Sorbian population is historically around Brandenburg; I wonder if the same genetic markers are found around there.

in the year 1000 everything east of the elbe was slavic speaking, specifically the wendish tribes, of whom the sorbs are a relict. the west slavic and baltic peoples were not exterminated, rather, the historical data suggests a concomitant process of christianization and germanization, whereby the christian religion and the german ethnic affiliation had a perfect identity in the wendish mind. this explains their extreme resistance to christianization, unlike the more distant poles christianity was not a way to integrate themselves into wider european civilization, but the door into full germanization. wends referred to the christian god as the 'german god.' long story short, the defeat of the wends in the 13th century ushered in the great german 'drive to the east.' almost certainly the germans genetically absorbed the slavic and prussian (the baltic tribe, not the people who inhabited early modern prussia).

as it is likely that sorbs are a rump population with minimal gene flow in because of their cultural isolation, i predict their genetic character will to a large extent simply bet a subset of that of northeast germans. this is because the northeast germans are to a large extent ancestrally slavic, if not preponderantly. i wouldn't be surprised of the Y chromosomal barriers we have seen between poles and germans isn't discernible on the autosomal level, there's a long history of ethnic colonists from a dominant group (e.g., greeks, carthiginians, etc.) simply absorbing the local substrate by marrying indigenous women.

^The reason for the Y-DNA gap between modern Germany and Poland is the resettlement of a whole segment of population that made up part of the west/east gradient along the North European plain.

The Pomeranians, Silesians, Masurians and others were the ones bridging that gap before WWII, but with them gone, we were seemingly left with a hole. Their descendants are still around mind you, deep in Germany usually, but they don't form distinct populations, and so won't be sampled and put on a plot.

Also, Y-DNA gives us a limited view of things, so when we look at more genetic material, we find that this smooth gradient is still actually there. And there will be more overlapping once populations from Poznan, Kujawy and west Pomerania are sampled. Historical reasons for that smooth transition include the spread of the Corded Ware folk from the east and the later Slavic expansion west.


and the affirmation that nations are, at least in part, genetic entities.

Sorry, come again?

I read these graphs as saying the exact opposite. Genetic variation in Europe is entirely clinal, as opposed to following national / linguistic borders. Hell, the second graph screams that in the reader's face! As for the amount of structure ("clumpiness") within these graphs, it would be nice to have an idea of how these data points were sampled (representative from the country's population, or taken from discrete locations separated by hundreds of kilometers - as Dresden and Munich are?) All they say in the paper is that they deliberately exclude "outliers"- but maybe I just missed something.

As for the high correct identification rate, it's only impressive when you look at the successful cases. Predictably, the French are the most obvious counter-example: a correct identification rate of about .5, which is still rather impressive (and indeed quite suspicious), until you realise that the data doesn't seem to include Italian samples!

Idle hypothesis of the day: include Italians in the data, take a representative sample of the French population, and the French mis-identification rate will increase beyond .5.

I read these graphs as saying the exact opposite. Genetic variation in Europe is entirely clinal, as opposed to following national / linguistic borders.

we'll know in a few years when they sample more evenly. i.e., if there is a discontinuity at the waloon-flemmish ethno-linguistic border.

"I read these graphs as saying the exact opposite. Genetic variation in Europe is entirely clinal, as opposed to following national / linguistic borders."

Dienekes responds to a similar comment at his blog:

"Geography is not sufficient. Poland may be geographically closer to Germany than to Moscow, but it is much closer to the latter genetically. Spain is closer to Romania than to any of the intervening countries (save Romance-speaking France and Belgium)."

By Jason Malloy (not verified) on 22 Nov 2008 #permalink

The comments about the northeastern Germans reminds me of the saying that a Prussian is a Pole who has forgotten who his grandparents were.

Dienekes doesn't think straight sometimes, largely due to his lack of knowledge of North/Central European history.

There's no reason why the Celto-Germanic Munich Germans would overlap with Poles. And if half the German score is based on them, then that solves the mystery why Germans don't overlap with Poles more than we think.

In a more precise study focusing on that issue, several samples would have to be taken from East Germany, and from western Poland. So far the most western Polish population sampled has been from Lodz (in this study, along with Warsaw), which means we're basically comparing Central/Eastern Poland with an East German city (Dresden) that has seen settlement from western Europe en masse over the past 800 years.

Dienekes has never taken any of that into account. He just reads these graphs lke trafic singals most of the time. But eventaully we will get a really good picture of the situation, with specific studies on this very issue. So hang tight.

The correlation between genetic similarity and geographic
location seems to be too good to be true ( as indicated in
the first picture). It would be nice if the authors could publish the data behind these pictures.