Will the Archon X Prize for genome sequencing be won in 2010?

By dgmacarthur on January 4, 2010.

The Gene Sherpa predicts that Complete Genomics will win the Archon X Prize in Genomics in 2010. In the comments, Keith Robison is wisely skeptical. I agree with Keith - it's unlikely that the X Prize will be won this year, and if it is the winner is unlikely to be Complete Genomics.

For those who don't know the prize, here's the brief summary: the X Prize Foundation will give US$10 million to the first team to satisfy the following conditions:

sequence 100 human genomes within 10 days or less, with
an accuracy of no more than one error in every 100,000 bases sequenced [note that the stated error rate on this page is mistakenly quoted as one in 10,000 bases], with
sequences accurately covering at least 98% of the genome, and
at a recurring cost of no more than $10,000 per genome.

Sequencing technology is developing fast, but it seems unlikely that these conditions will be successfully met in 2010. Here's why:

Firstly, full coverage of 98% of the genome is still a challenge for the current crop of short-read sequencers (such as Complete Genomics). Much of the human genome is highly repetitive, making it extremely difficult to place a short sequence read uniquely in its correct location in the genome. In its most recent publication Complete Genomics quoted simulation data suggesting the maximum possible coverage with its current technology is 98%, and in reality achieved coverage of between 86 and 95% of the genome. That will certainly improve as the technology moves forward, but it will be seriously challenging to reach 98% - and that's not even counting the non-trivial fraction of the genome that is too repetitive to be included in the reference genome.

Secondly, and more importantly, the required error rates are much too stringent to be achieved by current short-read technologies. Complete Genomics can just about meet the one in 100,000 requirement when it comes to single base variants (SNPs), but the error rate is substantially higher than this both for small insertions and deletions and (crucially) for the large-scale rearrangements known as structural variants, which involve the insertion or deletion of over 1,000 bases of material.

Having seen first-hand the challenges of calling insertion/deletion and structural variants from short-read sequence data, I'm pretty skeptical about the probability that the error rate for these variants can be reduced to one in every 100,000 bases. Even long-read technologies such as the Pacific Biosciences platform will struggle to call these accurately enough to meet the Prize's requirements.

This is not to downplay Complete Genomics' achievements: I've been seriously impressed with what the company has achieved since it released its first human sequencing data last February. I've also watched the attitude of the genomics community shift from hostility, through curiosity, to genuine interest; I suspect we'll see some non-trivial outsourcing of genome sequencing to the company even by large genome facilities during 2010. The company could certainly meet the cost requirements of the Prize, especially once its new Californian facility is up and running smoothly (some time early this year); nonetheless, I think the other conditions are beyond the likely capabilities of the Complete system in 2010.

When will the Prize be won? Keith Robison predicts that a win is at least two years away; I think he might be right, although I'd give 50-50 odds of a successful attempt in 2011 (but reserve the right to modify that prediction based on technology developments this year!).

Anyway, we'll know very soon if an attempt is likely in 2010: to qualify for the prize in a given year, a team must have registered for the prize by January 15th of that year at the latest. At this stage the list of registrants includes only quite unlikely candidates (e.g. 454, whose technology is too low-throughput, and the totally unproven ZS Genetics), so unless Illumina, Life Technologies, Complete Genomics, Pacific Biosciences or Oxford Nanopore registers within the next 11 days it's extraordinarily unlikely that we'll see a win this year.

I'll keep you posted.

Subscribe to Genetic Future.

Follow Daniel on Twitter

More like this

Daniel,
In addition to net coverage and accuracy, perhaps the most onerous X PRIZE competition stipulation is the sequencing of DIPLOID genomes (6 billion bp) and "complete genotyping of each chromosome."
That puts a premium on read length and haplotyping -- Complete may get there with its "long fragment read" approach -- but unfortunately suggests The Prize won't be claimed for some time. Damn!
Kevin

Hi Kevin,

Ah - I wasn't sure if that wording necessarily required that the chromosomes be completely phased, or whether it would be sufficient to just have an accurate (diploid) genotype call at every individual base along the genome. I now see the competition guidelines state: "A rearrangement or haplotype error counts as one error" - so it would seem you're right that a completely phased diploid genome is required.

In that case you're absolutely right: there's no technology around that could meet these requirements, and I'm pretty dubious we'll see one even close by 2011. Complete's LFR approach is elegant, but it won't provide sufficiently good haplotyping to produce <1 switch error per 100,000 bases.

Kevin's right, I'm afraid. But maybe we should start a pool?

If we ignore the cost and time limit, is current technology able to deliver the other requirements? If so, at what cost and time frame?

No - as described in detail in the post, current technology cannot meet the coverage and accuracy requirements.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…