Genetic Future

Complete Genomics is finally back on the road towards fulfilling its promises of $5000 human genome sequences, after delays in obtaining funding for a massive new facility pushed back its plans by six months. The $45 million in funding it announced this week will be sufficient to build the new Silicon Valley facility, which the company claims will have the capacity to sequence a staggering 10,000 genomes over the course of 2010.
Complete Genomics is an unusual creature in the second-generation sequencing menagerie: instead of aiming to generate revenue by selling machines to researchers and biotech, Complete has an entirely service-driven business model. Basically, it aims to create a series of extremely automated sequence factories with a single input (human DNA) and a single output (an accurate and comprehensive list of all of the variants present in that sample’s genome), operating on a massive scale. All of the steps in between will be performed using in-house sequencing technology and analytical software.
It’s an audacious plan – but clearly it’s a plan with sufficient plausibility to convince investors to cough up $45 million in the middle of a major venture capital drought. If Complete can meet their ambitious goals for 2010 they stand to gain pole position in a field where the pay-offs are potentially massive, as the preferred providers of the raw genomic material for the new field of personalised medicine.
So, will it work? I spoke on the phone to Complete Genomics CEO Clifford Reid and company director Alex Barkas about their plans over the next 18 months.


The Complete package
When I last spoke to Reid in February he was rather coy about the precise price and specifications of the genomes that Complete would be offering following its commercial launch. The exact nature of the product is now clear: Reid told me unambiguously that for $5,000 the company plans to offer “120 gigabases [i.e. 120 billion bases] of mapped reads covering >98% of the genome with an error rate of 10-5 [i.e. 1 error every 100,000 bases]“. If Complete can meet those specifications it will indeed be competitive with the sequence quality generated by existing platforms, for a substantially lower price. 

It’s worth noting that these specifications are considerably better than those that Complete demonstrated on its first genome sequence, for which only 92% of the genome could be covered with an error rate at least three times higher than Reid’s proposed value – so it will be very interesting to see how these numbers have improved with the genomes generated for Complete’s pre-commercial customers.

If the specifications can indeed be reached, this is a level of genome quality I’d be prepared to live with for my own genome, albeit at a lower price: to be specific, if a sequence of this quality was available for under $2,000 retail I’d buy my own genome right now.

I won’t be sending my genome direct to Complete, though – they’re planning to only offer their service through retail genomics providers such as Knome and 23andMe, who will presumably add a non-trivial mark-up to the final price in exchange for dealing with sample handling, DNA extraction and – crucially – interpretation of the resulting sequence.

Time-frame for rolling out large-scale sequencing
Complete already has a large “pre-commercial” facility that it has been using to provide genome sequences for several customers (e.g. the Broad Institute, announced in March) at a price of around $20,000 each, more than its projected $5,000 commercial price but still cheaper than its nearest current competitor.

Reid told me that the facility was currently working on samples from around a dozen customers, and aimed to have 100 genomes returned to these customers by the end of 2009. We can apparently expect to hear announcements regarding these projects over the next couple of weeks.

Following the company’s commercial launch, Complete’s time-frame is aggressive: Reid projects the completion of 1,000 genomes by mid-2010, and then a further 9,000 genomes in the second half of the year. That implies the development of a truly monstrous amount of sequencing capacity by the end of the year – which Reid told me would be the result of minor improvements in the sequencing platform along with iterative tweaks to the pipeline as a whole.
Where will the demand come from?
I asked Reid where he saw the bulk of the demand coming from for the 10,000 genomes Complete is promising for next year. He argued, quite plausibly in my mind, that some of the substantial money currently being spent on genome-wide association studies will begin being directed towards whole-genome sequencing once a $5,000 price point becomes feasible. 
Assuming that Complete can meet its quality specifications I don’t think there’s much doubt that it will find customers for its 10,000 genomes; the benefits of accurate whole-genome sequencing for research consortia studying the genetic basis of both rare and common diseases are clear. The largest benefits will probably flow to rare monogenic disease researchers, who will be able to find their target genes with whole genome sequences of a relative handful of patients, but researchers looking for low-frequency variants underlying severe cases of complex diseases or pharmacogenetic variants will no doubt also be keen. 
The biggest demand at first, of course, will come from well-funded cancer researchers, who are already investing heavily in whole-genome sequencing and would no doubt be happy to be able to do it cheaper. However, I’ll be interested to see if the quality of structural variation calls from the Complete platform will be adequate for cancer genomics, where large-scale rearrangements are pervasive and functionally important.

The competition
The most obvious commercial competitors to Complete are second-generation sequencing company Illumina, who announced the launch of a genome sequencing service back in June. Illumina’s sequencing technology is currently the most widely-adopted platform on the market, and the company has proven itself an agile combatant in the past (for instance, through its demolition of market leader Affymetrix in the genotyping chip arena). Illumina also has exclusive access to technology from Oxford Nanopore, one of the most promising companies in the embryonic third-generation sequencing market. The prospect of Illumina sequencing facilities stocked with high-throughput nanopore machines must be enough to induce hyper-ventilation in Complete’s larger investors.
Reid, however, told me that Illumina’s foray into the service market had “essentially no effect” on Complete’s prospects:

There is currently no industry-trusted solution for sequencing 1000 complete genomes and getting it right. We matched Illumina’s quality at a much lower price in February – they’re cheaper now, but so are we.

I do agree that the single-minded focus on complete human genomes and high level of automation will make Complete a serious competitor, but the competition is far more fierce than Reid implies. In addition to Illumina and its Oxford Nanopore connection, Pacific Biosciences is now flush with cash after a $68 million funding round and predicting a commercial launch in late 2010 for their single-molecule long-read platform.
Reid argued that timing was crucial: “By the time PacBio has sequenced their first human genome, we’ll probably have sequenced our 100,000th.” As the price of sequencing tumbles, Reid argued, the basis of competition will shift from reagent costs to “industry-trusted systems” – and by getting there first, Complete will have positioned itself to gain that trust. 
Barkas explained what he saw as Complete’s key advantage: “it’s not that our instrument is better, it’s that our process is better.” Essentially, Barkas argued, Complete has overcome all of the technological barriers it needed to in order to achieve cheap sequencing; from here on, it’s simply a matter of engineering and logistics. And by focusing on that single input-single output model, Complete plans to build a system that is inflexible but supremely efficient at doing one thing: sequencing human genomes. Meanwhile, Barkas argued, its competitors are designing systems that are flexible but will never match the ability of Complete’s platform when it comes to assembling human genomes.
Will Complete succeed?
Complete’s notion of applying automation and economies of scale to create a stream-lined factory for human genome sequences is quite compelling, but I’m still not convinced. Although Reid told me that Complete’s technology is “uniquely suited to large-scale human genome sequencing”, anyone working on the application of short-read sequencing data to the human genome can testify to the major challenges short-read technologies face in the repetitive regions that litter our chromosomes.
There’s a pretty widely-held belief among my sequencing colleagues that the future is in long-read sequencing, which will allow the accurate reconstruction of repetitive regions as well as de novo assembly of complete human genomes without the need for alignment to a reference sequence. Complete’s technology doesn’t lend itself easily to increases in read length* (unlike, say, Illumina, which started with 36 base reads and is now routinely generating 108 base reads). In addition, Reid was insistent that the company would not consider incorporating alternative technologies into their pipeline, and indeed this might prove extremely difficult for such a tightly integrated system. 
That means that if any of the third-generation (long read) platforms is able to bring a product to market in the medium term, Complete may find itself trapped with sub-optimal technology against a more powerful competitor before it can establish itself as the “industry-trusted” system of choice.
The probability of that scenario is currently difficult to gauge, but given the existence of two credible contenders (Illumina+Nanopore and PacBio) it’s clearly non-zero; and with $45 million at stake that must be a fairly sobering thought for investors.
* Note, though, that Complete has proposed some ways to get around some of the limitations of read length (described in this post) but these are by no means complete solutions.
A trivial note

Compare that to the number of genomes sequenced in the world today — maybe a dozen? 10,000 is still an industry changer.

Reid is being rather misleading here – although there are less than a dozen complete human genomes published, many more have been sequenced, both for research projects yet to be published and by commercial enterprises (e.g. Knome), many more are in progress, and that number will be well into the hundreds by the start of 2010. Sequencing 10,000 genomes in a year will be an impressive feat – but it’s not quite the three-orders-of-magnitude leap that Reid implies here.


Further reading on Complete Genomics
Complete emerged from stealth mode in October last year, following a long period developing a novel short-read sequencing technology (see the cartoon summary of their approach here). In February this year they made a splash at the Advances in Genome Biology and Technology meeting in Florida, which I covered in some detail at the time. I also met with Complete CEO Clifford Reid and CSO Rade Drmanac at the AGBT meeting and wrote a lengthy follow-up post detailing their responses to questions about their technology and business plan.
For more analysis of Complete’s most recent announcement it’s hard to go past Kevin Davies’ thorough dissection. If you have an In Sequence subscription, Julia Karow’s piece is also well worth a read.

Comments

  1. #1 Dan Vorhaus
    August 26, 2009

    Daniel -

    Thanks for the detailed write-up and analysis. Is it your understanding that Complete’s focus on sequencing only human genomes would prohibit it from other human-based sequencing, e.g., a proteome or microbiome? If so, do you think that is a potential long-term competitive disadvantage for Complete when compared with more versatile platforms that, presumably, would be able to handle these other -omes?

    Thanks.

    - Dan

  2. #2 Clive Brown
    August 27, 2009

    “Compare that to the number of genomes sequenced in the world today — maybe a dozen? 10,000 is still an industry changer.”

    I’d like to add some colour to Cliff’s comment quoted from the Bio-IT World article.

    If you look at the the Sanger web site for example, you will see a real-time graph of how much usable (map-able) data is being produced per week on their Illumina systems. Currently that stands at around 400 Gigabases per week and I think the total to date (also shown) is at several Terabases. Therefore, in one week, without making any fuss, they are generating about 7 human-genomes-worth of data (at full coverage), undoubtedly much of which is being analysed and submitted to the 1000 Genomes web site(s) such as at the NCBI.

    This is just one centre, owning about 5% of all the Illumina systems out there. This implies that the world capacity for human genome sequencing on this system alone (I don’t have figures for SOLiD or 454) is about 140 Human Genomes per week. All together maybe its at [less than or equal to] 300 per week or then 10-15000 a year.

    If the current vendors meet their performance targets (9G per day), and sales continue at the current rate, this would reach over 200,000 human genomes a year by the end of 2010.

    It is no longer the case that every Human Genome that gets sequenced is published in a flashy paper – nor should one assess the state-of-the-art by counting such publications – many of which are over a year old when they appear in print anyway.

  3. #3 Daniel MacArthur
    August 27, 2009

    Hi Dan,

    Yes, Reid has told me quite firmly on several occasions that the company will focus exclusively on human genomes; and yes, I think this may prove to be an issue further down the line. For the next few years, however, there should be plenty of demand for sequencing human genomes, and by the time microbiome sequencing comes of age it’s entirely possible that Complete’s technology will have moved on.

  4. #4 Daniel MacArthur
    August 27, 2009

    Hi Clive,

    Very nicely said.

    For the curious, the statistics that Clive quotes on Sanger sequence production are available here.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.