Lupski, J.R., et al. (2010). Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. New England Journal of Medicine advance online 10.1056/nejmoa0908094
Roach, J.C., & et al. (2010). Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science : 10.1126/science.1186802
Two new papers out today – the first ever studies to employ whole-genome sequencing for disease gene discovery – neatly illustrate both the promise and the challenges lying ahead both for clinical and personal genomics.
The first paper
presents the final – and successful – outcome of geneticist James Lupski’s attempt to track down the genetic basis of his own disease
. Lupski suffers from a syndrome called Charcot-Marie-Tooth (CMT) disease, a neurological condition which results in muscle weakness and wasting. The paper describes the process of sifting through the thousands of potentially functional variants to eventually pin down the mutations responsible, which turn out to be in a gene that has been previously associated with CMT.
This study is a clear illustration of the power of whole-genome sequencing to cast light on a long-standing personal mystery (Lupski has been searching for his disease mutation for decades). However, Lupski was fortunate that his mutation fell within a gene that had already been demonstrated to be linked to CMT; as the second study shows, researchers hunting for entirely novel disease-causing genes face a more serious challenge.
Here the outcome is less unambiguously cheerful: this paper illustrates that even with complete genomes it can still be hard to pick apart the genetic origins of disease.
Despite having entire genome sequences from four individuals, the researchers could only narrow down the list of candidate disease-causing genes to a shortlist of four – and it was only with the addition of large-scale sequence data from an additional two unrelated patients that the most likely gene could be identified (this result was published in a separate paper in November last year).
The basic problem here is that we’re still extremely bad at differentiating between mutations causing serious disease and perfectly benign polymorphisms – each of us have genomes littered with genetic variants that look like nasty mutations but have little or no effect on health. In fact, Lupski’s genome illustrates this nicely: one of the mutations causing his disease is a premature stop codon that disrupts the function of a gene – but his genome also contains an additional 120 stop codons disrupting other genes, presumably without severe health effects.
So all of us are walking around with hundreds of gene-disrupting variants, and finding the single causative gene amongst all that noise is seriously challenging. In the case of the Miller syndrome study adding more genomes from other family members helps a lot, but it wasn’t quite enough to nail down the gene responsible.
There’s some ominous implications here for personal genomics as we move into the whole genome sequencing era. If it’s hard to find a severe disease mutation using four complete genomes, how much more difficult will it be to interpret variants with much more subtle effects on health using only one genome (i.e. your own)? What will we do with rare, potentially serious-looking variants found in an individual’s genome but nowhere else?
Predicting the functional effects of such variants – particularly if they happen to fall in one of the thousands of human genes without any confident functional annotation – is notoriously difficult. Yet each of them represents a potentially actionable piece of data, a variant that may portend some serious but preventable condition looming in our future or the future of our children – if only we had the knowledge we needed to interpret it.
In fact, the Lupski paper reminds us that even the functional annotation that does
exist is far from universally reliable: the team found that Lupski was homozygous for 5 other mutations marked in the HGMD database
as causing severe diseases that he does not actually have. It’s likely that these represent errors either in the database or the primary literature (many alleged Mendelian mutations in the literature are in fact benign variants spuriously recorded as disease-causing).
The key message here is that sequencing technology is still moving far faster than our ability to interpret the resulting data
. Squeezing more value out of personal genome sequencing will require improved databases of variants (cue the 1000 Genomes Project
) and vastly improved tools for inferring the functional effects of novel variants – a task that will require combining evolutionary data with large-scale functional experiments.
These two papers represent the first foray into the brave new world of clinical genetics: gene discovery and diagnosis using complete genome sequencing. The projections I’ve seen suggest that hundreds of severe disease patients will have complete genomes sequenced this year, and thousands more will have all of their protein-coding genes (i.e. their exomes
) sequenced. We’ll be learning a lot about the complexities of genetic variation in the process; and hopefully, the end result will be vastly improved tools with utility for personal genomics in general.