The latest issue of Nature is just as it should be: nearly wall-to-wall human genomics, with a special focus on personal genomics (more on that later).
The main event is a potential historical milestone: quite possibly the last two papers ever to be published in a major journal describing the sequencing of single human genomes from healthy individuals1. The papers, which both appear to be open access (kudos to Nature for that decision) describe the analysis of the first Asian genome by researchers at the Beijing Genomics Institute, and the sequencing of the first African genome by a cast of thousands centred around next-gen sequencing company Illumina.
Both genomes were sequenced using next-generation sequencing technology from Illumina, which generates sequence information in the form of very short (35-50 base pair) reads. Although each read is extremely short and relatively error-prone compared to reads from old-fashioned sequencing methods, the sheer number of reads generated by the Illumina technology make whole-genome sequencing feasible: both studies stitched together in excess of 3 billion of these reads to assemble their genomes. That means that each base in the genome was covered, on average, by over 30 independent reads (as opposed to an average of around 7 reads for the Watson and Venter genomes) – more than enough to compensate for the increased error rate of the Illumina platform.
These papers are both important technical achievements, the first of many publications that will emerge over the next few years taking advantage of these short-read technologies to characterise entire human genomes (Watson’s genome was also sequenced using next-generation technology, but on a platform that generated much longer reads than the Illumina system at a correspondingly lower throughput). The Illumina platform allows the assembly of an individual genome sequence far more quickly and cheaply than old-school Sanger chemistry, or the 454 platform used for Watson’s genome, paving the way for affordable personal genome sequences.
However, many technical challenges still remain. Short read technology struggles to map large insertion/deletion polymorphisms (so-called structural variation), and is almost completely unable to generate accurate sequence data for the 10-15% of the genome that lies in highly repetitive regions. In addition, such platforms are largely unable to determine whether two heterozygous variants are found together on the same copy of a chromosome, or on separate copies (a problem known as phasing). Generating a complete genome sequence in the strictest definition, including accurate phasing, awaits the development of ultra-long read single molecule sequencing platforms.
The papers also illustrate the challenges that lie ahead for personal genomics: an analysis of the Asian genome for potential disease-causing mutations revealed one heterozygous (i.e. single-copy) mutation known to cause deafness, and a possible increase in genetic risk for tobacco addiction and Alzheimer’s, but little in the way of convincing, medically actionable results. As I’ve said before, the technology here is moving much faster than our understanding of the underlying biology – you and I will be able to afford our genome sequences long before we have much idea what they mean.
The other important message from these papers is that we can no longer learn very much in terms of biology from individual genome sequences, at least from healthy people. Each additional genome sequence does contribute a list of new genetic variants, but these returns are rapidly diminishing: in both studies only ~25% of the single-base variants are novel. This proportion is admittedly substantially higher for insertion/deletion and larger structural variants (for which detection approaches are still immature), but that too will diminish with each new genome added to the database and as sequencing technology improves. By the time the 1000 Genomes Project has dumped its last petabyte of data on the web there will be relatively few polymorphisms (variants with a frequency of greater than 1%) left to discover, at least in the European, East Asian and West African populations.
So attention has already well and truly turned to converting sequence into biological meaning – and that’s a job that will ultimately require many hundreds of thousands of genome sequences, each attached to information about biological traits and disease status. That means the end of the brief era of high-profile “single human genome” papers, which started in a sense with the anonymised, pooled and fragmented human reference sequences published in 2001, peaked with the celebrity genomes of Venter and Watson in 2007/2008, and now ends (I suspect) with two anonymous non-European genomes.
Of course, we will still see a number of papers describing whole genomes of diseased individuals, particularly cancer samples – indeed, there is one such paper in the same issue of Nature, which you can read about at PolITiGenomics from David Dooling, one of the authors on the paper. [Added in edit following prompting in comments: This paper has its own set of firsts: first female genome published (Leiden University’s “first female genome” got a lot of media attention, but is yet to emerge in print); first disease genome sequenced; first paper publishing multiple genome sequences (one of the cancer and one of a healthy skin sample from the same patient); and probably other firsts I haven’t thought of. If it were in any other issue of Nature I’d be all over it, but I’ve been completely distracted by the other cool stuff in this issue!]
But nonetheless, the age of the one-genome paper is fast drawing to a close. Human genetics now moves into a phase of new challenges and rewards – the era of population genomics.
1 Update: John Hawks spoils my argument by noting that we will still see fossil and archaeological single genome sequence papers in major journals. Drat! Like any good scientist, I have revised my hypothesis in the light of opposing evidence: it now states that we will see no further single genome papers in major journals using DNA from healthy modern humans. Any other exceptions I missed?
Wang et al. (2008). The diploid genome sequence of an Asian individual Nature, 456 (7218), 60-65 DOI: 10.1038/nature07484
Bentley et al. (2008). Accurate whole human genome sequencing using reversible terminator chemistry Nature, 456 (7218), 53-59 DOI: 10.1038/nature07517
Images of average East Asian and African faces from Face Research.