Can't find your disease gene? Just sequence them all...

By dgmacarthur on April 21, 2009.

A paper just published online in Nature Genetics describes a brute force approach to finding the genes underlying serious diseases in cases where traditional methods fall flat. While somewhat successful, the study also illustrates the paradoxical challenge of working with large-scale sequencing data: there are often too many possible disease variants, and it can be extremely difficult to work out which are actually causing the disease in question.

The authors looked at 208 families where multiple members suffered from mental retardation and where the family history was consistent with the underlying gene being carried on the X chromosome. In most cases the families weren't large enough to use linkage analysis to narrow down the location of the gene - in other words, the disease-causing mutation could be almost anywhere among the more than 800 genes scattered along this chromosome.

In these cases the traditional approaches of genetics break down - apart from screening the known genes involved in mental retardation and hoping for a lucky break, there's little that can be done to find the gene responsible. The researchers thus took advantage of automated large-scale DNA sequencing to simply analyse the protein-coding regions of nearly every gene on the X chromosome.

That's a total of one million DNA bases per patient - a
particularly impressive figure given it was generated using traditional
Sanger sequencing rather than one of the massively high-throughput
second-generation sequencing platforms now available.

The researchers found many genetic variants that would be expected to disrupt gene function: almost 1000 changed the predicted protein encoded by a gene, 22 introduced unusual "stop" signals, 15 changed the reading frame and 13 were found in strongly evolutionarily conserved regions associated with RNA processing.

Of
the 42 variants most likely to cause disease (so-called "truncating"
variants) 38 were found in only one family, and these tended to cluster
together in specific genes - for instance, one gene contained 5
different rare, damaging mutations. However, many of these variants were found in both patients and their healthy male siblings,
suggesting that they are not causative in mental retardation. These
genes could represent subtle predisposing factors for mental
retardation, but it's likely that most of them are simply genes that
can be inactivated with little or no deleterious consequences for
humans.

Overall, only nine genes showed strong evidence for
disease-causing mutations. The researchers went on to sequence these
genes in a further 914 mental retardation patients and over a thousand
controls, but found only a handful of likely disease-causing mutations
in these genes in other patients.

Although the technical
achievement is impressive, the picture from this survey is somewhat
depressing (although not really surprising) for researchers interested
in using large-scale sequencing to discover disease-causing variants.
It's a clear demonstration that even examining the majority of protein-coding sequence will be insufficient to capture most of our nasty genetic secrets
- many of these lurk deep in non-coding DNA, while a fair chunk of the
remainder simply hide in the biological noise resulting from all of the
other non-disease-causing variants in the genome. In this study it's
likely that the researchers have actually uncovered a fair number of
disease-causing mutations (for instance, among the almost 1000
protein-altering variants) but are currently simply unable to
distinguish them from benign polymorphisms.

What's the solution? More sequencing,
for a start - digging deep into the non-coding portions of the genome,
and also ensuring very accurate coverage of the protein-coding portions
(in this study an average of just 75% of the targeted regions were
actually successfully sequenced in any given individual). This is
already entirely feasible due to the emergence of second-generation
sequencing, and will become rapidly more affordable as sequencing costs
drop. Already there are research groups around the world planning
massive sequencing studies to identify rare mutations underlying severe
diseases.

But sequencing won't be enough: we need much better methods for sifting out the truly function-altering genetic variants from the biological noise.
This is already difficult enough for protein-coding regions (as this
study demonstrates); we currently have virtually no way of picking out
disease-causing variants in the remaining 98% of the genome. There's a
clear need for developing highly accurate and comprehensive maps of the
functional importance of each and every base in the human genome,
using all of the tools at our disposal - something that will keep us
geneticists busy long after we've run out of genomes to sequence.

Subscribe to Genetic Future.

More like this

A program developed by Cornell researchers deduced the natural laws without a shred of knowledge about physics or geometry. The research is being heralded as a potential breakthrough for science in the Petabyte Age, where computers try to find regularities in massive datasets that are too big and complex for the human mind."
http://blog.wired.com/wiredscience/2009/04/newtonai.html

Programs like these are in their infancy but they are enjoying some successes. It isn't just genetics that is drowning in data.

Did you read about the Incidentalome?
http://jama.ama-assn.org/cgi/content/extract/296/2/212

Noise is the big problem here. And will continue to persist for decades.

-Steve

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Glyphosate reduces soil biodiversity and decreases the proportion of native species (French)

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…