A paper just released in the Lancet describes a thorough and integrated approach to squeezing as much clinically relevant information as possible out of a genome sequence. However, despite a state-of-the-art clinical interpretation pipeline, the major message from the paper is just how far we still have to go before we can make full use of our genetic information.
The paper is based on the genome of Stephen Quake (right), which was sequenced
using the single-molecule platform developed by Helicos (I wrote about Quake’s genome publication
at the time). This is a rather curious choice: of all of the genome sequences currently available for analysis, Quake’s is one of the least complete and accurate due to the very short reads and high error rates of the Heliscope. It’s also interesting to note that at least one of the other authors on the paper – George Church – has a substantially better-quality sequence of his own genome
(generated by Complete Genomics) in the public domain.
Nonetheless, Quake’s genome it is. The authors throw everything they can at the sequence, bringing in information from databases of both common and rare disease-associated variants and variants affecting drug metabolism, as well as family history and various clinical tests.
There are some genuinely intriguing results: three independent rare mutations in genes associated with sudden cardiac death (although one of these is later shown to be probably benign), and – integrated across the full available set of common risk markers – high lifetime risks for three actionable conditions, myocardial infarction, type 2 diabetes and obesity. Based on Quake’s risk predictions, his physicians decided to recommend a lipid-lowering drug (which, incidentally, he would be predicted to respond positively to based on variants in drug-metabolising genes).
But more importantly, there are the variants that simply can’t be interpreted. This includes virtually everything seen outside protein-coding regions, and the majority of even those variants found inside coding regions. We simply don’t understand the biology of most genes well enough yet to be able to predict with confidence whether a novel variant will have a major impact on how that gene operates; and we have an even less complete picture of how genes work together to affect the risk of disease.
That means that the real benefit of whole-genome sequencing over other assays – the uncovering of truly novel or rare genetic variants – has much less of an impact than it should, because in most cases it’s impossible to assign function to such variants. Indeed, it’s striking in this study that the really compelling, actionable findings – the increased risk of myocardial infarction and metabolic diseases, and the drug metabolism effects – come largely from common variants, most of which would be captured by chip-based assays such as that used by 23andMe. (The two rare variants potentially linked to sudden cardiac death are intriguing, and warrant extra surveillance, but don’t yet appear to be compelling risk factors.)
The authors should be commended for their efforts in bringing a wealth of functional annotation and clinical interpretation together for this study, but it’s clear we have a lot further to go before we can extract everything of value from a genome sequence.