Last week I posted on the publication of three papers in Nature describing whole-genome sequencing using next-generation technology: one African genome, one Asian genome, and two genomes from a female cancer patient (one from her cancer cells and one from healthy skin tissue). At the end of that post I noted that the era of the single-genome publication is drawing to a close as the age of population genomics commences.
Today GenomeWeb News reports from the American Society of Human Genetics meeting on the biggest current foray into the field of population genomics: the 1000 Genomes Project. The project aims to sequence somewhere between 1,200 and 1,500 whole genomes at low coverage (sequencing each base an average of 2 to 4 times), providing a powerful catalogue of human genetic variation extending down into the variants below 1% in frequency.
At ASHG, David Altshuler announced the near-completion of three pilot projects for the 1KG project and the generation thus far of 3.8 terabases (that’s 3.8 million million bases) of sequence data. Over the last two months, according to GenomeWeb News, “the team deposited as much data each week as was present in GenBank when the effort began”. That’s a mind-boggling amount of data, and an indicator of the staggering volumes of data still to come: in 2009 the project is expected to generate about 250 times that volume of sequence.
The hard part of human genomics – linking sequence variation to disease risk and other traits – is still to come, but the 1KG project (as Brendan Maher calls it) will pave the way for these difficult experiments, both by creating a map of human genetic variation and by driving the development of the tools required for large-scale human genome sequencing. Exciting times…