HapMap phase 3 data available for browsing

i-bafc1078cd52232da434a3620640cd2d-hapmap3.jpg

This will probably only be of interest to population genetics afficianados, but I just noticed that the HapMap project has made its phase 3 data available through its browser (the data were previously available for download, but are much more accessible - especially to non-bioinformaticians - through the browser interface).

The HapMap project is a massive international collaboration collecting information on common sites of genetic variation (called single nucleotide polymorphisms, or SNPs) in anonymised individuals from a variety of human populations. Phase 3 has data on about 1.5 million genetic markers for 1,115 individuals from 11 populations. That's substantially fewer markers than in earlier phases of the HapMap project, but on a hugely expanded set of samples (the original HapMap data-set contained information on just 270 individuals from 4 populations). Of particular interest are three additional populations with African ancestry (Luhya and Maasai from Kenya, and African-Americans collected in southwest USA), given the exceptionally high level of genetic diversity in African groups relative to other human populations.

This is still very much a rough draft of the catalogue of human genetic diversity, sampling just a tiny fraction of our species' populations and being restricted only to common genetic variants. Extending the catalogue to include rare variants will require whole-genome sequencing of much larger samples - work that is currently being kick-started by the ambitious 1000 Genomes Project.

The breakdown of the analysed samples in the phase 3 HapMap data-set is below the fold...

Number CodePopulation
71ASWAfrican ancestry in Southwest USA
162CEUUtah residents with European ancestry
82CHBHan Chinese in Beijing, China
70CHDChinese in Metropolitan Denver, Colorado
83GIHGujarati Indians in Houston, Texas
82JPTJapanese in Tokyo, Japan
83LWKLuhya in Webuye, Kenya
71MEXMexican ancestry in Los Angeles, California
171MKKMaasai in Kinyawa, Kenya
77TSIToscani in Italia
163YRIYoruba in Ibadan, Nigeria

More like this

A new paper in PLoS, Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples: Many association studies have been published looking for genetic variants contributing to a variety of human traits such as obesity, diabetes, and height. Because…
The successes of genome-wide association studies (GWAS) in identifying genetic risk factors for common diseases have been heavily publicised in the mainstream media - barely a week goes by these days that we don't hear about another genome scan that has identified new risk genes for diabetes, lupus…
I discussed the second-generation sequencing company Complete Genomics a couple of weeks ago (see here and here). These guys are unique in that they offer their technology only as a service, rather than the usual business model of selling platforms to genomics facilities, and a highly restricted…
Singapore is a racially diverse society, so there's a natural pool of diversity from which one can draw for study of human variation. The Han majority of Singapore derive predominantly from Fujian in southeast China. The Indians are mostly from the southern regions dominated by Tamils or Telugus,…