Genetic Future

Update: In the comments below, SNPedia co-founder Michael Cariaso notes that Duncan has already lost his crown to the anonymous European NA07022, recently sequenced by Complete Genomics, who weighs in with 5891 associations to Duncan’s 5321. Records don’t last long in the age of high-throughput genomics!

Author David Ewing Duncan now officially has the most annotated genome of any human being; but given that the majority of those annotations are wrong and most of the remainder only weakly predictive, he’s also a powerful illustration of how far we still have to go before the era of personal genomics comes to fruition.

Duncan isn’t the person with the largest portion of his genome sequenced (Craig Venter, Jim Watson and Seong-Jin Kim are the named individuals currently competing for that honour), but he has so far cast his net the most widely in examining the potential functional information within his DNA. His 5,321 current associations come courtesy of SNPedia, a public database of genetic associations maintained by Michael Cariaso.
The majority of those associations, unfortunately, are pure noise – the detritus left behind by the noxious wave of false positives that was the era of the candidate gene association study, prior to the advent of modern, robust genome-wide association studies. Of the relative minority that are likely to be genuine, most are common variants with very small effects on disease risk and thus extremely limited predictive value. Wading through the sheer mass of loosely annotated data in Duncan’s SNPedia report provides sharp insight into the challenges of navigating large-scale genetic data.
(Added in edit: In hindsight the paragraph above could be read as a criticism of SNPedia, which it isn’t – SNPedia simply provides a catalogue of the genetic associations in the literature along with links to the relevant papers, and it’s up to the user to decide what standards of evidence to apply in deciding whether or not an association is useful. So just to be clear – the emotive language in the paragraph is aimed towards the appallingly high levels of false positives in published genetic association studies and not towards SNPedia’s decision to list them.)
It’s clear that consumers will need tremendous guidance in that navigation – but it’s still unclear exactly who will be the best at providing that guidance. The medical establishment certainly wants you to think that they are the only qualified providers, but upstart private personal genomics companies like 23andMe are giving clinicians a run for their money, and crowd-sourced efforts such as SNPedia remain a wild card.
In the meantime, Duncan isn’t resting on his laurels – he says he plans to have his entire genome sequenced “soon” as part of a broader process of self-exploration. It’s worth keeping an eye on Duncan as the model of an extremely enthusiastic early adopter of personal genomic technologies – the obstacles to understanding his genetic information will soon be things that the rest of us need to wrestle with as well.


  1. #1 cariaso
    June 23, 2009

    I recently taught Promethease to assume default values for unreported genotypes from full genome sequencing. As a result the anonymous caucasian NA07022 on a Complete Genomic’s machine hold’s the new title at 5891. The equally anonymous BGI Han chinese YanHuang weighs in at a respectable 5480. David Ewing Duncan’s pooled results from 23andMe, deCODEme, and Navigenics give him 5321.

    These numbers grow daily. But I’d like to address your statement about “vast majority of those annotations are wrong and the remainder only weakly predictive”. There is a truth to that which I won’t deny. SNPedia is trying to capture everything, in order to distill out the ones which REALLY means something. As a result we’ve recorded many which may never replicate. Others are only weakly predictive because they’re not causative, but they are in LD with a still unknown SNP which may not be on a microarray, but is causative. But there are quite a few which would fail to meet your standards, but still offer valuable insight.

    rs1800795 is a good example of this. It was first reported when in vitro observations made it possible to make a prediction in vivo. This was later confirmed and since then 30+ studies have also found significant results about this snp. It does something. But that something doesn’t fit neatly into any of our current disease classifications. Viewing medicine through a genomic lens means that many of the traditional classifications don’t fit well. A similar effect can be seen for rs1800629

    If personal genomics isn’t yielding enough good answers, its partly because we’re not yet asking the right questions.

  2. #2 AMac
    June 23, 2009

    This post is a great snapshot of the state of personal genomics, and Michael Caraiso’s comment adds a lot to the picture. I wonder what how state of affairs will appear in a year or two?

  3. #3 Daniel MacArthur
    June 23, 2009

    Hi Michael,

    Thanks for the update – a good reminder that records never last long in the genomic era. 🙂

    Your comment made me realise that parts of my post could be interpreted as a criticism of SNPedia, which was not the intent (see clarifying statement in the post). I completely agree that providing a catalogue of genetic associations that is as complete as possible is a useful thing to do, even if that means including many, many entries that will later turn out to be false.

    Your rs1800795 example is a good one – there may well be something interesting going on there biologically, but it’s likely that the majority of the associations reported for that SNP are simply wrong, the result of testing the SNP in hundreds of different patient cohorts (using multiple different sub-classifications, genotype models and statistical approaches) and then publishing any results that breached the magical P=0.05 threshold. Your point about the limitations of the phenotypes tested so far is a good one, though – no doubt there are still many useful associations out there in search of a sufficiently well-phenotyped cohort (and in fact I know of a few examples currently in press).

  4. #4 Misha
    June 23, 2009

    I guess my 4820 are just so much chopped liver. Sigh.

  5. #5 Daniel MacArthur
    June 23, 2009

    You’re still in with a shot, Misha – just push George to sequence your genome before David can get his done. 🙂

  6. #6 Steven Murphy MD
    June 24, 2009

    Daniel I am so happy you point these things out. SNPedia is not the “dark horse” currently this is the best in class…..Mike just doesn’t have rubenstein pimping him on Oprah. Nor does he have Google pumping money into his efforts. This makes his movement pure with agnostic information.

    Once you step into the realm of interpretation, you do end up in the role of healthcare professional. SB 482 is about to go down in flames because the Spitters in Mountain View in constructing a bill have in essence said it is ok to perform bioinformatics in the same role as professional interpretation……

    Imagine this argument, To look at a blood smear, you need to regulate the microscope, but not the Pathologist or lab tech…..

    Doesn’t make any sense…..

    Take a look at the cpt code 99420, which is “Utilization of an algorithm/instrument to assess health risk” That is precisely what using these SNPs is going to be. I say going to be, because they are not exactly clinically valid in most cases…..

    Mike will win, Money will get sued……


    BTW, David is still the self exploration KINGPIN!!!!!

New comments have been disabled.