This little USB drive represents the current pinnacle of luxury personal genomics. It's the product of Knome (pronounced "know me"), a Cambridge, MA-based biotech start-up fronted by genomics pioneer George Church (recently profiled in Wired). In return for $350,000, Knome's customers receive a shiny 8 Gb drive containing their entire genome sequence (or rather, a hefty fraction of it), along with specialised browser software for viewing it.
$350,000 is a hell of a lot of money to fork over for a few gigabytes of data. So, how much of a return will these customers be getting on their substantial investment?
What you get for your $350,000
A recent article in Technology Review outlines what a Knome customer can expect to receive for their generous outlay. Generating the genome sequence itself (which is done through a collaboration with the Beijing Genomics Institute) is just the beginning; the resulting data are then pored over by a team of informaticians and clinicians to mine out the useful details:
The first things the analysts look for are small variations that are found frequently in the broader population and have been linked to increased risk for myriad diseases. They then look for other types of genetic changes--including DNA deletions or duplications--linked to specific diseases. Lastly, they scan the genome for novel variations, changes that have not yet been spotted in the limited amount of human DNA that has been sequenced to date. The effects of such changes are uncertain, but the scientists try to predict them by considering the structure of the resulting proteins.
Customers receive their data at a "mini symposium dedicated to the recipient's genome, where scientists explain the results, the process behind them, and their limitations" - basically a genetics conference with themselves as the only subject.
It's hard to know how many uber-wealthy genetics geeks out there have both the cash and the desire to sample Knome's product, although Technology Review reports that the company handed over its first complete sequence to a customer last month, and aims to sequence at least 20 genomes this year.
A marginal return
Obtaining a complete genome sequence is certainly a massive technological leap beyond the offerings of the current crop of "budget" personal genomics companies, like 23andMe and deCODEme. These companies charge around $1000 for a chip-based analysis of up to a million sites of common variation scattered throughout your DNA, while Knome aims to provide the sequence of all three billion bases in your genome. In dollar-per-byte terms, Knome's customers are paying around 350 times more but receiving almost 3,000 times the information.
That starts to make Knome sound like a pretty good deal, at least for those of you with a spare $350,000. However, it's worth bearing in mind that the seemingly paltry one million or so sites targeted by deCODEme et al. are not a random sample of the genome, but have been selected to be as informative as possible about patterns of common variation at other sites. The chips used by deCODEme, for instance, directly examine around 0.3% of the genome but provide surprisingly accurate information about more than 80% of common variation, due to the magic of linkage disequilibrium - put simply, variants close to one another on a chromosome tend to be inherited together, so by examining one variant in a region you can indirectly obtain information about many other nearby sites.
In addition, the million or so variants targeted by the "budget" companies are the ones that researchers know the most about, since the chips these companies use were also the backbone technology for recent genome-wide association studies of common diseases. Virtually all of the over 200 genetic variants recently associated with diseases like diabetes and rheumatoid arthritis are already covered by 23andMe's chip; you don't need a complete genome sequence to see if you have them or not.
In fact, the major advantage of whole genome sequencing is that it provides information on rare variants - changes in your genome that you share with fewer than 1% of the general population - that are almost completely invisible to chip-based assays. Because natural selection tends to hold seriously nasty variants at low frequency, rare variants are enriched for changes that are likely to have a negative impact on health. That means that a complete genome sequence will find many interesting variants that would be missed by a 23andMe-style scan; the problem is that rare disease-causing variants are also very poorly characterised by current studies, both because they are poorly tagged by chips, and because their low frequency makes it hard to find enough carriers to study.
That means that right now obtaining a complete catalogue of the rare variants in your genome is not a great deal of use for most people: we could probably confidently assign disease-causing status to perhaps a dozen or so mutations (most of which would only cause disease in your children if you had the bad luck to mate with someone carrying mutations in the same gene), and tentatively tag a few hundred or so as likely candidates. The majority of the monsters in your genome, on the other hand, would remain hidden in the noise - especially those lurking in the vast non-coding areas of the genome in which function is currently almost impossible to decipher.
The race is on
Of course, this doesn't mean that whole genome sequencing will always be a waste of money. The cost of DNA sequencing has plummeted over the last few years due to the advent of "next-generation" sequencing technology, with three competing platforms currently battling to stake out a share of the lucrative medical sequencing market. At the same time, large-scale genetic association studies and advances in other fields of biomedical science are rapidly increasing researchers' ability to assign function to genetic variants, thus boosting the value of a personal genome sequence. The stylised graph below gives you a rough idea of the dynamics here:
Basically, we're currently at a stage where costs are dropping much faster than the rate at which the value of genetic information is increasing - in other words, you and I will be able to afford a genome sequence long before science will be able to tell us much about what it really means. However, the beauty of a finished genome sequence is that (unlike 23andMe-style chips, which are constantly being replaced by higher-resolution models) it never becomes obsolete. Once you have your entire sequence sitting on your hard drive you can simply sit back and wait for new associations and techniques for assigning function, which will soon be appearing at an exponentially increasing rate as technology improves.
It will be a while before any company will be able to offer an "entire" genome sequence in the literal sense - there are still many regions of the human genome (mainly large, repetitive areas close to the centres and ends of chromosomes) that can't be fully sequenced with any existing technology. For this reason, Knome's customers are probably only getting reliable sequence information for somewhere between 85 and 90% of their entire genome. In addition, the "short read" nature of current high-throughput sequencing methods means that they will likely be missing information on large structural variations (insertions, deletions and other rearrangements of large chunks of genetic material). We have a long way to go before a personal genome sequence can be regarded as truly "finished".
Nonetheless, the desire to have as much of your sequence as possible, ready to cross-check against new genetic discoveries, is presumably one of the rationales driving customers to Knome. Millionaire Dan Stoicescu, Knome's first publicly named customer, was described in an article in the NY Times back in March as intending to "check discoveries about genetic disease risk against his genome sequence daily, like a stock portfolio." (Mind you, given that he could likely buy the same service for a thousand dollars or so in a few years' time for the same informational return, this is a curious investment - I can only assume he followed different logic when making his millions).
Of course, there are other justifications for being an early adopter: for instance, Stoicescu also views his extravagant purchase as "a kind of sponsorship" helping to fund improvements in sequencing technology and genome analysis. There's some genuine truth to that. In the same way that wealthy early adopters created markets for later-commodified gadgets like laptops and mobile phones, the lessons learnt from Knome's first few affluent customers will pave the way for cheap personal genome sequences for the rest of us.
How much is a genome worth?
It depends on who you are, and why you want it. For someone suffering from a rare genetic disease with an unknown mutation, that sequence could make a huge difference - potentially allowing a more accurate prognosis, genetic counselling, and pre-natal diagnosis to prevent passing on the disease to children. For the rest of us the benefits are currently pretty small, and will likely remain so for the next few years; and by the time the average health benefits of having a genome sequence become non-trivial, the sequence itself will likely be cheaper than many standard diagnostic tests. At that point the decision is straightforward and personal genomics will become mundane.
For those of us with a strong interest in human genetics the critical point is likely to be reached substantially earlier. I certainly can't afford Stoicescu's lavish price, but I'd fork over a few thousand bucks to take a peek inside my DNA. I'm under no illusions about the likely meagre health outcomes, but if there are any obvious and unpleasant surprises in there I want to know. There are also the non-medical bonuses, like being able to put myself on the genetic map - even if I know roughly where I'm likely to end up, it would still be damn cool.
DeCodeMe seems like an inherently bad deal: they can basically only test for SNPs that are currently associated with a disorder but aren't covered by someone else's intellectual property rights. If their risk assessment or choice of SNPs is crap, then so is your analysis.
Knome sounds much more promising (if they do actually sequence your whole genome), but it gets into some legal gray areas if they do.
If you sequence someone's BRCA1 and BRCA2 alleles as part of a whole genome sequencing, analyze the results, and then give them a risk assessment for cancer, Myriad Genetics will shut you down for violating their patent.
So I'm certain Knome leaves out a lot of the most valuable analysis, that or most of the $350,000 is going to royalties instead of sequencing.
But what if you sequence someone's genome and just hand them the raw data? If they subsequently use BLAST or another online DNA database tool to find out the results on their own, are you in the clear? Not many people know how to do that (or score the results), but an "independent" party could easily distribute a free application that acts as a front end to BLAST. People could just dump a sequence on the icon and out would pop out their BRCA1/BRCA2, APOE, and other risk profiles.
What about companies that perform sequencing using whatever samples and primers they are given and then just mail out the raw data? Are they responsible for making certain they're not inadvertently performing an expensive medical test?
Do we know how complete the sequence is that Knome provides? What sequencing technology and assemble algorithms do they use? Will users be reasonably confident that you are getting the vast majority of your genome, or will they miss some CNVs etc? Obviously it provides a lot more info than the SNP chips, but is it really a finished product or a work in progress.
A fiver? I'd pay that.
That is just bad! Like people need to know which diseases they might maybe possibly develop. But, if they can afford, I say let them!
Following up on hibob's thoughts: there already is a free application (at least for personal use) that takes anyone's SNP data as provided by the SNP-companies and runs it against the SNPedia website of medically verified SNP data to generate a personal report that is totally independent of whoever produced the (SNP) data. The program is called 'Promethease' and is available at www.promethease.com/.
Welcome to science blogs. Your blog will be good information for students in my class "Genetics and society" My research focuses on the genetic future of plants.
Certainly we are likely to see a wave of freeware analytical tools pop up online as personal genomic data becomes more widely available. In addition to the Promethease tool mentioned above, Dienekes recently released a neat little application that infers within-Europe ancestry.
I think we can also expect lawsuits targeted at anyone distributing software that could be seen as violating patents (e.g. BRCA1/BRCA2 testing, which is vigorously enforced by Myriad). This isn't a problem with SNP chip data (which doesn't include any cancer-causing mutations in the BRCA genes), but it will certainly be an issue with whole-genome data.
I should have been clearer (and have edited my post to clarify): when I said "Once you have your entire sequence sitting on your hard drive", by "entire" I was referring to a substantially higher-quality sequence than the one that current Knome customers are likely to receive. In a few years' time there'll almost certainly be at least one commercial platform offering 10,000+ base pair high-quality reads from single DNA molecules - high coverage with that sort of technology would give you a genome sequence worthy of the epithet "entire".
As far as I can tell Knome hasn't announced which platform(s) they're using (their sequencing facility, the Beijing Genomics Institute, has access to all three major platforms and a couple of the experimental techs, so it could be anything); they certainly haven't publicly discussed any stats on their outcome measures. But regardless of which platform they're using they will certainly be missing large CNVs, and even with high coverage (and I'd be pretty peeved as a customer if they were providing less than 25x coverage) they'll also miss some heterozygous variants. With current short-read technologies there will be essentially no information on haplotype phase, and (as I mentioned in my post) they're probably almost entirely missing sequence for the 10-15% of the genome spanned by large segmental duplications and other repetitive regions. So I think it's safe to say that the genome sequences customers are receiving right now are not a finished product.
Actually, commenter cariaso above is the creator of Promethease, so he would be in a much better position than I am to address some of the legal issues raised by hibob.
Mike, any comments?
They're not going to get my money until they supply enough data and equipment that I can clone myself so I'll have, collectively, more time to read all these fricken science blogs. So there!
Warning: I'd be in a much better position to help hibob debug bioinformatics computer programs than offer legal advice.
The laws here are either antiquated or simply don't exist. Last year they were different. Next year they will be different. If necessary, there are other laws outside the USA. Today it is clearly legal for 23etAl to do testing in California. Which is to say the Feds could sweep in at anytime. But the Federal law isn't clear anyway. It seems likely it would be illegal for 23etAl to operate in New York. New Jersey has explicit rules ensuring that data belongs to patients. It seems to be true that BRCA snps were intentionally left off all microarrays out of legal fears. But Celera's Genetic Risk Score is based on 5 snps. 4 of which are already on the chip used by deCODEme. If the 23andMe v2 tests all 5. Have they just violated IP? What if they don't report it in your formal report, but they release the raw data? If you run Promethease and it tells you about those 5 snps together, who just violoate the IP?
Me for writing the code?
You for running it?
Anonymous wiki contributors for writing the genoset?
SNPedia for hosting the content?