Broad Institute to use Complete Genomics to sequence genomes of cancer patients

By dgmacarthur on March 3, 2009.

I discussed the second-generation sequencing company Complete Genomics a couple of weeks ago (see here and here). These guys are unique in that they offer their technology only as a service, rather than the usual business model of selling platforms to genomics facilities, and a highly restricted service at that: Complete has stated fairly categorically that it will only be sequencing human genomes (no plants, algae, or even chimpanzees!).

Whether this business model will prove a commercial success remains to be seen, but the company seems to have impressed the genomics community with its early-release data. Today the company officially announced a partnership with the Broad Institute to sequence five complete genomes. This isn't exactly breaking news (the collaboration was openly discussed at AGBT a few weeks ago), but these details are new:

Complete Genomics will use its proprietary DNA sequencing technology to sequence five genomes from samples provided by the Broad Institute. The first genome sequenced will be a test case that has already been studied extensively by the scientific community. The other four genomes are tumor and matched-pair normals; one pair will be used to study glioblastoma and the other melanoma.

Presumably the first sample will be one of the anonymous HapMap DNA samples that are currently being sequenced as part of the 1000 Genomes Project,
giving the researchers at the Broad a solid baseline to determine the
accuracy of the Complete Genomics platform. The other four genomes will
then be from two cancer patients (one sample from the tumour and one
from normal tissue in each case) to study the genomic anarchy that underlies cancer
formation, as part of a vastly larger series of studies of this type.

The
price isn't stated on the press release, but Complete confirmed to me
in an email a while back that these early pilot projects will cost
$100,000 per 5 genomes. That price will drop rapidly as Complete scales
up its sequencing facility: at their commercial launch in June 2009,
the company plans to release a formal pricing scheme that "will support
a $5,000
genome sequencing price", according to the email.

Update on error rates
In my previous post
I speculated about the possible number of errors that might pop up over
a whole genome sequence using Complete's platform. The issue here is
that the genome is very large, so even quite a low error rate can
result in a high number of "noise" variants, potentially obscuring the
signal from real genetic variations.

I recently spoke over the
phone to Geoff Nilsen, a senior bioinformatician at Complete, about the
company's own estimates of the number of errors they expect to see in a
complete genome sequence. Nilsen emphasised that these estimates are
still very rough, being based on experimental validation of a
relatively small number of variable sites from their pilot genome, and that the company has
further software and process development underway to characterise and
reduce their error rates.

Still, based on preliminary quality control data, Nilsen very cautiously estimated that they had somewhere in the vicinity of 80,000-100,000 false positive calls, and perhaps around 1,000 false negatives,
for single nucleotide polymorphisms in their pilot genome sequence. I
emphasise again that these are estimates with very large error bars -
for instance, the 95% confidence interval is 78,000 +/- 236,000 for false
positive variants!

The data are even more preliminary for insertion
and deletion variants, which lack a clean reference data-set (for the
single base variants above Complete was able to rely on data from the
HapMap project). At this stage Complete has validated a set of 57
homozygous insertion/deletion variants, all of which were called
accurately by their platform, but this is far too small a data-set to
be extrapolating to genome-wide error rates.

Controlling both
false positive and false negative error rates will of course be
absolutely crucial for many applications of whole-genome sequencing -
for instance, clinicians interested in finding the single mutation that
causes a severe congenital disease will not want to sift through huge
numbers of false variants, or to find that their one target mutation
was missed by the base-calling algorithm.

Nilsen told me that at this stage that they would not expect clinicians to apply their sequencing service in this way to individual patients;
they expect the first clinical applications to come from sequencing
studies of much larger numbers of patients and appropriately matched
controls. This is perfectly reasonable caution - none of the current
whole-genome sequencing technologies is "clinic-ready" in the sense of
providing a highly accurate single-patient diagnostic test, and of
course the "noise" from normal genetic variation between individuals
will also make it important to use large numbers of patients to hunt
down disease-related mutations even in the absence of sequencing error.

For non-clinical applications (e.g. population genetics) these
types of error rates seem more than acceptable; researchers are used to
dealing with much noisier data than this. If Complete can indeed offer
a full genome sequence at this quality for $5000 I suspect they will be
receiving plenty of interest from geneticists interested in normal
human variation as well as disease.

How Complete's estimated
error rates hold up in the real world - particularly for
customer-submitted samples of varying quality - remains to be seen, but
we should soon get a better idea from the results of collaborations
between Complete and large genomics facilities like the Broad.

Subscribe to Genetic Future.

More like this

How do they calculate false positives/negatives for a SNP with two common alleles? Isn't some kind of reproducibility stat more appropriate?

I wish I was born 20 years later. It's the most exciting time in the whole history of biology!

well you are still alive to enjoy the exciting biology!

I thought Complete was only going to do human genomes. I'm wondering how different the tumor genomes are and what problems this might give for their platform.

How stochastic are the errors across runs? Could you follow up a possible clinical false positive with another several runs and expect to exponentially reduce errors?

I've been waiting for this, These guys are incredible!!! Just the fact that they have been able to sequence multiple genomes this year of the highest quality is mind boggling!

In response to question from "Anonymous": We used the genotype information from the HapMap project to calculate the false positive/false negative rates AND used our sanger sequencing results for that as well...The HapMap genotype data was gathered with a different platform for this same individual, allowing us to do the comparison. In those cases where we differed from the HapMap data, we did a small set of follow up studies with a third platform (good 'ol sanger sequencing) to calculate the error rates.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…