European genes mirror European geography

Blogging on Peer-Reviewed ResearchWithin a drop of blood, you can find all the information you need to reasonably guess where a person came from, without ever having to look at their face, name or passport. Small variations in our DNA are enough for the task. They can be used to pinpoint someone's place of origin to a remarkable degree of accuracy, often to within a few hundred kilometres.

The new discovery comes from a team of Swiss and American researchers led by  John Novembre at UCLA, who wanted to understand how the human genome varies on a continental scale. To that end, they looked at the genomes of over 1.300 people sampled from almost three dozen countries across Europe. The sample was originally collected by GlaxoSmithKline to hunt out genetic variations that influence the effectiveness of drugs and their side effects, but Novembre's team put it to use in understanding the links between genes and geography instead.

They analysed at single-letter differences in DNA ("single nucleotide polymorphisms" or SNPs) at about 200,000 places in each of the genomes. They compared this data to each person's country of origin as well as that of their grandparents if possible.

To work with this massive collection of information, Novembre applied a mathematical technique called principal component analysis (PCA) to transform the unwieldy set of data into a more manageable form. The technique looked for underlying patterns in the massive collection of SNPs and boiled them all down to just two variables, known as principal components. The upshot is that each person could be plotted as a point on a simple two-dimensional graph, whose axes correspond to the two principal components. It collapsed a complicated cloud of data into a simple sheet.

The result was startling - the genetic and geopolitical maps of Europe overlap to a remarkable degree. On the two-dimensional genetic map, you can make out Italy's boot and the Iberian peninsula where Spain and Portugal sit. The Scandinavian countries appear in the right order and in the south-east, Cyprus sits distinctly off the "coast" of Greece.


Zoom in closer, and the map even reveals distinct genetic cluster within Switzerland based on the language people speak. German-speaking Swiss cluster to the east, Italian speakers to the south and Francophones to the west. Even so, the clusters overlap and in general, the data reveals a genetic continuum between Europeans, where the borders of the genetic map are fuzzier than those of its geographical counterpart. As far as genes are concerned, the closer together two people live, the more similar their DNA is.

There were a few exceptions to the genetic map's accuracy, with a few countries appearing in odd positions. Slovakia, for example, turns up in the middle of Italy rather than next to the Czech Republic where it belongs. Russia too is further west than its actual position and appears to be hugging Poland (which I find ironically unsettling in the light of recent political events). But Novembre says that both exceptions are probably due to small sample sizes - "Russia" in this case was only represented by six people, and just one poor individual was waving the flag for Slovakia.

Exceptions aside, the overlay between the two maps is startlingly accurate. Using only genetic information, Novembre's team can place over 90% of people within 700km of their place of origin, and over 50% of people within 310km. The graph below shows the different degrees of accuracy for different countries. 


The results have implications for a lot of biomedical research. Many scientists are scanning entire genomes on a hunt for SNPs that affect a person's risk of diseases like cancer or their reaction to drugs. Novembre says that researchers who are running these "whole-genome studies" need to bear in mind where their sample has come from. Even if a study looks at a small and seemingly related parts of Europe, it would have to adjust for any geographical influences in the genetic variations it uncovers.

This study is just the beginning. At the moment, the analysis is too crude to detect rare genetic variants that are the result of new mutations. These tend to cluster around the place where the mutation first sprang into being, and as such, they can give us more information about the structure of populations on an even finer scale. As more and more genomes are sequenced and statistical methods improve, the genetic map will become clearer and clearer.

Reference: John Novembre, Toby Johnson, Katarzyna Bryc, Zoltán Kutalik, Adam R. Boyko, Adam Auton, Amit Indap, Karen S. King, Sven Bergmann, Matthew R. Nelson, Matthew Stephens, Carlos D. Bustamante (2008). Genes mirror geography within Europe Nature DOI: 10.1038/nature07331



More like this

My post The Genetic Map of Europe drew a lot of interest, but there's even a cooler paper on the same topic out, Genes mirror geography within Europe: ...Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances;…
I referenced a paper in PNAS yesterday, and I thought it might be good to actually point to it today. There's nothing that new in the paper. It confirms the finding that ~20% of the ancestry of African Americans is European, and, that African ancestry seems to be much more dominant when it comes to…
A few months ago I relayed preliminary data which suggested that Estonians are not like Finns. Now a new paper, Genetic Structure of Europeans: A View from the North-East: Using principal component (PC) analysis, we studied the genetic constitution of 3,112 individuals from Europe as portrayed by…
Another paper on European phyogeography, Investigation of the fine structure of European populations with applications to disease association studies: An investigation into fine-scale European population structure was carried out using high-density genetic variation on nearly 6000 individuals…

It's amazing how similar the genetic and geopolitical maps are for Europe. Can't help but wonder how well this kind of study would hold up for other continents/regions.

After reading some European history (Tony Judt's "Postwar"), I would guess that this correlation is the work of the ethnic cleansing during (and after) WWII. I doubt that such good correlation would have been seen earlier than 1941.

Try S. Oppenheimer's book on the Origins of the British. These are not recent effects, they go all the way back to the first settlement after the ice. The ethnic cleansing comment is puerile.

By Chris Phillips (not verified) on 02 Sep 2008 #permalink

Within a drop of blood, you can find all the information you need to reasonably guess where a person came from, without ever having to look at their face, name or passport.

I highly doubt that would work for most of the so called new world melting pots. I myself trace my family tree back through Hungarian and Danish ancestry and was born in Brazil. I don't have any data to back up my hunch but I suspect that the mixing of genes in such countries would encompass many people of African origins, native Brazilian Indians, Europeans of all stripes, Asians and so on and on.

By Fernando Magyar (not verified) on 02 Sep 2008 #permalink

Omer, I take your point about so-called "ethnic cleansing" (or genocide as I prefer to call it). That might affect some parts of Eastern Europe quite markedly. However the proportions of the population involved in Western Europe by and large would have been too small to make much difference; moreover many of those countries have had considerably more immigration since the war than before. Take the UK, for instance. Immigration only really picked up when air travel became reasonably cheap.

Pedantic note :

Genocide = attempting to kill all of an ethnic group
Ethnic cleansing = attempting to remove an ethnic group from an area

All genocide is also ethnic cleansing, not all ethnic cleansing is genocide, and the terms are not synonymous, Tony

By Woobegone (not verified) on 03 Sep 2008 #permalink

I think a topographic map, instead of a geopolitical one, might also have been interesting. The Alps and the Pyrenees, at least, seem to have had a definite isolating effect as regards the Spanish and Italian clusters, and waterways like the Channel the opposite effect.

Ethnic cleansing might also play into it, however, especially regarding PL and DE - I bet you would have found much more genetic overlap there before WW II.

Another interesting overlay might be a map of historically predominant religions, come to think of it.

By Phillip IV (not verified) on 03 Sep 2008 #permalink

Woobegone, the methods of removal usually, if not always, amount to genocide. Moreover I refuse to use the term "cleansing" in this context because, obviously, it's an utterly revolting image. We should not condone dehumanization or let it enter into our language.

This homogeneity has more to do with the simple human behavior of marrying people close by and of the same social status than with violent nationalism. If ethnic slaughter were the cause, then places like England and Spain would show greater heterogeneity. The truth of the matter is that European communities have perennially been rather conservative marriage-wise and European society, at least since the fall of the Roman empire in the West, ridgidly segmented.

It would be interesting to see such a study in the Americas.

Hmm, "breeding with" would probably be more accurate than "marrying" in that first sentence. A case of my internal sensor getting the best of me and putting forward a euphemism, I would say.

LoL on that correction, Julian.

What the heck are the Slovaks doing so far from the rest of the Slavs and hanging out with the Italians?

Oops, reread the article. Small sample sizes.