Still quite a way, based on this survey of second-generation sequencing users (subscription only, I think) conducted by the industry publication In Sequence.
Along with a range of other questions, the survey asked users about the cost to generate one billion base pairs (one gigabase, or Gb) on their platform at the end of 2008, which is about as current as we’re likely to get. I’ve estimated below the total cost to sequence a complete* human genome, assuming an overall depth of coverage** of 30x, for the three most widely-used second-generation platforms:
The fine print
Note that the number of respondents is pretty small for each platform, although it’s probably enough to get a fairly good idea of the cost situation at the current time (although I’d appreciate any comments from users out there who think the costs are inflated).
Here’s how In Sequence describes the survey question:
…users were asked to estimate the total cost for generating a gigabase of high-quality data. They were asked to include — and break down, if possible — all costs, such as labor, sample prep, sequencing consumables, instrument amortization, service contracts, data analysis, and data storage.
Bear in mind that costs are substantially discounted for the larger sequencing facilities, due to both economies of scale and special deals from technology providers; and, of course, you’d expect companies offering retail genome sequencing will likely add a hefty profit margin on top of this number.
Why, then, are even the lowest costs here still higher than the $100,000 sequence currently offered by retail sequencing company Knome? I don’t know, to be honest – perhaps Knome’s service has a lower coverage than 30x (which is a bit of a worry), or Knome may be offering the service below cost to help drum up customers.
It’s clear that costs are dropping fast, and new technology on the horizon will drop them even further in the near future. Still, it’s clear that it will be a real challenge to meet the predictions of a $1000 genome this year that commenters such as George Church have made.
* The term “complete” is used in a pretty loose sense here – these short-read platforms aren’t capable of sequencing somewhere in the vicinity of 10-15% of the total human genome, due to its highly repetitive nature. It’s still uncertain what proportion of this will contain functional elements.
** 30x coverage means that each base in the genome is sequenced an average of 30 times, which is on the low side; you’d probably want to spend a little extra to boost your coverage even for recreational genomics, and for clinical applications you might be looking at coverage higher than 100x. You can scale the price accordingly.