Regular readers will know that I’m at the Advances in Genome Biology and Technology (AGBT) meeting this week, one of the most highly-awaited meetings on the genomics calendar.
There’s a huge amount of fascinating data being presented (anyone interested in a blow-by-blow account should follow Anthony Fejes’ live-blogging), but there’s definitely an overarching theme: the evolving battle between the new-technology sequencing companies. This is a competition that most researchers in genomics are watching with great interest, because it promises to bring about very rapid advances in the speed, quality and affordability of large-scale sequencing above and beyond the mind-boggling progress of the last two years.
The week started with bold claims from Illumina, who provide the most widely-used of the three “second-generation” sequencing platforms (the Genome Analyzer), about the improvements that will be made to their platform in 2009. Dan Koboldt has a good overview of the details, but the main message is this: by the end of the year, Illumina claims that it will be able to routinely generate 95 Gb (that’s 95 billion bases, the equivalent of 30 human genomes) of DNA sequence per run. This increased yield will come with a boost to read length, which will aid genome assembly and the detection of large-scale insertions and deletions.
Most people I spoke to seemed to feel that Illumina’s claims were quite realistic – and they’d better be, because competition is coming from relative newcomers to the field. The first of these to present was Pacific Biosciences, who have been making big promises for quite a while now, but are still (by their own admission) at least a year away from a commercial release. Their presentation included some impressive new data, suggesting high accuracy and very long, continuous reads (up to 3,200 bases, which is massively longer than any other platform on the market). However, there was some uncertainty about the level of throughput that their platform will be able to achieve when it
(finally) reaches the market.
In any case, however impressive their data, PacBio’s presentation was blown out of the water in terms of sheer drama by the talk by Clifford
Reid, the CEO of the newest entrant to the new-technology sequencing market – Complete Genomics. Complete has been creating a buzz in the genomics community ever since it emerged from stealth mode in October last year promising to deliver complete human genome sequences at a cost of just $5,000 by mid-2009, and to sequence one million human genomes within the next five years.
Reid’s presentation was self-assured and quite persuasive. He presented data from the sequencing of the “complete” (see below) genome from a European sample from the HapMap project: although the data had a high error rate – only 40% of the reads could actually be mapped to the genome! – the sheer amount of data generated by the Complete platform (currently ~70 Gb over an 8-day run) allowed them to generate a consensus sequence and call single-base variants (SNPs) with high accuracy.
I was convinced by the SNP data, but I will be very interested to see how the system performs in terms of calling large-scale structural variants. Certainly the system has problems dealing with repetitive regions (as expected with short reads) – Reid noted that around 8% of the genome couldn’t be assembled due to these elements. These are major problems for very short read technologies that can’t be solved by simply increasing coverage; Reid’s presentation included a brief mention of a technology called “long fragment reads” that might help to address such problems, but the details weren’t clear. Large-scale structural variants play an important role in human variation and disease, so Complete will need to deal with these areas effectively if it is to generate genome sequences that can realistically be called “complete”.
Update 06/02/09: Here’s a relevant statement from an article in Bio-IT World:
Complete identified some 400,000 short indels [insertions/deletions] using its own proprietary software, but Reid admits there is room for improvement. “The assembly software does not today call large structural variations,” he acknowledged. “That’s one of our next high priority projects — to tease out of the datasets major structural rearrangements, inversions, translocations etc.” Reid calls it “a strategic commitment to write the assembly software that spans the spectrum of variance detection from SNPs to assembling a cancer genome.”
Anyone interested in the details of Complete’s data is in luck, as the company has released its raw sequence data for public consumption – it will apparently shortly be available through NCBI. Various summary statistics are also available on the company’s website.
The other interesting aspect of Complete is its unique business strategy – the company plans to only offer its platform within its own self-contained service centres, rather than selling them to genome facilities. I’m still not totally clear on why Complete has adopted this model, but it’s likely a combination of the complexity of their data (their method is generated as a series of 10 base pair reads which then have to be stitched back together) and the economy of scale; Reid noted that computing, labour and overhead costs per base all drop as the size of a facility increases.
One final point of interest is that Complete’s services will be completely restricted to the sequencing of human genomes – it will not accept projects involving non-human samples (a point that Reid made emphatically clear during question time). Reid presented this as meaning that Complete is not in competition with genome research facilities; there was an implicit suggestion that Complete would now take care of all whole human genome sequencing research, while genomics facilities could look after algae and such! I’m surprised by how dogmatic Reid was in declaring this, as it seems like this seriously constrains the market for the Complete service – but there are also considerable advantages to specialisation, and the human genome sequencing market is likely to grow very rapidly over the next few years.
Overall it was hard not to be impressed by the sheer audacity of Complete’s goals, and by the speed with which they appear to be moving towards those goals. There are still some non-trivial questions in my mind about both the technical and financial facets of the company’s strategy – and I will be putting these to company representatives over the next two days – but I think there was little doubt in the audience’s mind that this is a serious new contender in the DNA sequencing field.