First ever association study using whole genome sequences

By dgmacarthur on November 3, 2009.

New-technology DNA sequencing provider Complete Genomics will provide near-complete genome sequences of 100 individuals to the Institute for Systems Biology, driving the first ever association study for a complex trait using whole-genome sequencing. Here's the press release, and GenomeWeb has some additional information.

This is pretty exciting stuff:

The Institute for Systems Biology (ISB) and Complete Genomics Inc. announced today that they are embarking on a large-scale human genome sequencing study of Huntington`s disease (HD). ISB has engaged Complete Genomics to sequence 100 genomes, the majority of which will be used to investigate this disease, with
samples from affected individuals, family members, and matched controls to study
modifiers of disease presentation and progression.

The goal of this project is not to identify the mutations that cause Huntington's (the genetic basis of this disease is already extremely well-characterised), but rather to look for novel variants that alter the progression of the disease - usually called "disease modifiers". In other words, the goal here is to uncover genetic variants that explain variation between Huntington's patients in things such as age of onset or the speed with which the disease progresses.

The major novelty of this study is that the target trait is complex (i.e. is likely determined by multiple genes), whereas the small number of WGS disease studies reported to date have focused on much more tractable Mendelian diseases (those in which disease status is conferred by the presence of a single, disastrous mutation).

You can expect to see plenty of similar announcements over the next twelve months as the cost of sequencing drops to the point that WGS on moderately large cohorts becomes feasible (Complete Genomics is currently offering the service for around $20,000 per genome).

This project is somewhat unusual in its focus on disease-modifying variants rather than disease-causing variants; it's likely that most of the early WGS studies will actually aim to identify new, rare large-effect risk factors for complex diseases such as type 1 diabetes.

At the American Society of Human Genetics meeting we started to get a sense of how early WGS projects in complex diseases will look:

Individuals selected from the extremes of the distribution (e.g. particularly early-onset or severe manifestations of disease);
A focus on individuals with a strong family history of disease;
Sequencing of both patients and unaffected family members;
In some cases, experimental designs employing low-coverage sequencing of many individuals rather than high-quality sequencing of a smaller cohort.

The first two features will enrich the target population for the types of rare, large-effect variants that WGS is uniquely capable of detecting, while the addition of unaffected family members will make it easier to differentiate between disease risk variants and the benign polymorphisms that litter all of our genomes. The final feature - low-coverage rather than high-quality sequence - is still controversial, but was strongly advocated by Richard Durbin and Goncalo Abecasis at the meeting; this is the approach currently being taken by the 1000 Genomes Project. I plan to write more about this strategy soon.

Anyway, here we are: the technology has finally arrived that makes WGS-based studies feasible for complex traits. Now the real challenge - coming up with ways of handling the massive volumes of data generated by these technologies, and of finding true causal variants amongst the noise of sequencing artefacts and benign polymorphisms - starts to bite.

Subscribe to Genetic Future.
Follow Daniel on Twitter.

More like this

Any sense of what the price point is for these 100 genomes? Earlier in the year it seemed that the CG proof-of-concept genomes were going for around $20,000. Presumably the cost has come down from there, if only due to a volume discount, but the question is, how far?

Hey Dan,

I don't know what ISB paid, but I understand the going price is still hovering around or just marginally below the $20K/genome level - when I spoke to Clifford Reid a while back he suggested that volume discounts might drop this down towards $5K/genome soon, but only for customers looking to purchase around 1000 sequences.

Thank you for this fascinating post, Daniel. Five years ago I wrote an article for Genome News Network about the search for modifier genes in this disease. They had a few candidate genes at the time. As you report, technology has come a long way. I hope the investigators have great success.

"Delaying Huntington's"
http://www.genomenewsnetwork.org/articles/2004/03/19/huntingtons.php

I agree that this should be a really interesting study, but it seems like it is likely to be pretty underpowered, no? Assuming that they are focusing on individuals from extremes of the age of onset distribution or affected family members with similar CAG repeat length but widely disparate age of onset, perhaps they may be a bit more likely to find a variant with large effect size, but this should probably be considered a pilot that will most likely require much larger sample sizes to be adequately powered, don't you think?
-Matt Mealiffe
http://www.dnaandyou.org

Hi Matt,

It depends what your prior is regarding the effect sizes they're likely to observe - but yes, assuming that Huntington's progression-related traits have similar genetic architecture to other well-studied complex traits the study is woefully under-powered.

For the study design I mentioned in the post (in which the sample is enriched for rare large-effect variants by only including individuals with extreme phenotypes and strong family history) the power with even small sample sizes should be better. Even so, you're right that a sample size of 100 should definitely be regarded as a pilot project to establish feasibility rather than a full-scale gene discovery operation.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…