This is a profoundly impressive paper – a study of the patterns of genetic variation in 2,400 individuals from 113 African populations, by far the most comprehensive analysis of African genetic diversity ever performed.
I had heard that Sarah Tishkoff’s group had assembled a large
collection of African DNA samples, a daunting achievement in itself
given the logistical, ethical and social challenges involved (New Scientist
notes that “the researchers often had to use their vehicle batteries to
centrifuges used to separate out white blood cells from their samples”)
- but the breadth of the analysis in this paper has blown me away.
The number of markers used in the study is limited by current standards – just 1,328 variants compared to the hundreds of thousands used in modern genome-wide association studies – but by subjecting these markers to an extremely detailed and careful analysis Tishkoff and her co-authors have done justice to the sheer scale of the genetic diversity within the African continent.
Here’s an image that provides just a glimpse of that diversity:
This map was generated by first using the program STRUCTURE to infer 14 ancestral populations that best define worldwide human genetic diversity; each of these clusters has been assigned a colour, and the pie graphs above show the proportions of each of these clusters contributing to each of the African populations in the study.
By contrast, using this colour scheme virtually the whole of East Asia is a virtually undifferentiated sea of pink, Europe a block of blue, and even the diversity of India is reduced to a mix of just two colours. The reason for this is simple: our species evolved in Africa, and all of us non-Africans represent just a paltry sub-sample of the genetic variation that arose there.
The best bit about this study is that it is just the beginning. Tishkoff et al. lay out the future of research in the area:
Given the extensive amount of ethnic diversity in Africa, additional sampling, particularly from under-represented regions such as North and Central Africa, is important. Because of the extensive levels of substructure in Africa, ethnically and geographically diverse African populations need to be included in re-sequencing, genome wide association (GWAS) and pharmacogenetic studies, to identify population or regional-specific functional variants associated with disease or drug response (1). The high levels of mixed ancestry from genetically divergent ancestral population clusters in African populations could also be useful for mapping by admixture disequilibrium (MALD). Future large scale re-sequencing and genotyping of Africans will be informative for reconstructing human evolutionary history, for understanding human adaptations, and for identifying genetic risk factors, and potential treatments, for disease in Africa.
Increasing the number of markers used – and ultimately sequencing entire genomes – from Tishkoff’s already impressive collection of samples will no doubt be the first step; collecting more samples (especially from disease patients) of African populations is also a major priority for a number of research groups internationally. The sheer diversity in the African genome provides substantial power for genome-wide association studies looking to narrow down the regions associated with specific diseases, so work in this area will benefit us genetically impoverished non-Africans – as well, I hope, as the Africans themselves.
It will take me a long time to fully digest this paper – the supplementary information alone (warning: large PDF) contains 33 figures and 9 tables! – but while I flounder around looking for more superlatives to describe this work, you should check out Dienekes, Razib, GenomeWeb Daily News and the Spittoon for coverage of the major messages.