There's many a slip 'twixt spit and SNP: errors in personal genomics data

i-fd59a31aa6c4172c74be16f9e4e38b4a-train-wreck.jpgPeter Aldhous has a great piece of detective work in New Scientist, which has revealed a bizarre and sporadic glitch in the online software provided by personal genomics company deCODEme to allow customers to view their genetic data.

The glitch appears to be restricted to the display of data from the mitochondrial genome (a piece of DNA with a special fascination for genetic genealogists, since it is inherited almost exclusively along the maternal line). On several separate occasions the deCODEme browser presented Aldhous with a mitochondrial profile that was spectacularly wrong, differing from the profile in his raw data at 44 out of 93 positions.
Aldhous was kind enough to email me the raw data and some screenshots to illustrate the problem. It's clear that the error wasn't the result of Aldhous being presented with someone else's data: the profile is unlike any ever seen in a human being (genetic genealogist and blogger Blaine Bettinger is quoted in Aldhous' article asking whether it was certain that the sequence was from Homo sapiens). Nor is it the result of inaccuracies in the raw data - Aldhous' profile from deCODEme competitor 23andMe agreed with his raw deCODEme data at every site called by both companies.
Instead, it appears as though some problem in the code that translates a customer's raw data into the viewable format of the browser was doing something very strange: calling Aldhous' genotype at each mitochondrial position seemingly at random (44 errors out of 93 sites is compatible with pure chance). Even more bizarrely, whenever Aldhous saw the incorrect profile, it was always the same incorrect profile - on other occasions the browser presented his genotypes completely accurately.
Aldhous says in his article that deCODE is "still investigating" the source of the bug, but I understand that following the publication of his article the company's programmers have tracked down the source of the error and corrected it.
Lessons for personal genomics customers
Now, it's important to emphasise that this error is actually pretty benign: it's unlikely that it would ever have even been spotted by most customers, and Aldhous goes to great pains to emphasise that it didn't affect the risk profiles generated by deCODEme for various common diseases. It's also worth keeping in mind that the genotyping methods used by personal genomics companies are generally extremely accurate: comparisons between data on the same person generated by 23andMe and deCODEme, for instance, typically show discrepancies at fewer than one in 10,000 sites.
However, this incident serves as a canary in the personal genomics coal-mine - a warning of the challenges that lie ahead for companies in ensuring that massive, complex genetic data-sets are presented accurately to consumers.
It's also a useful reminder to personal genomics consumers to not take their results for granted. The process between spitting into a cup and viewing your genetic results online involves multiple steps where things can go wrong, ranging from errors in sample tracking (the most pernicious and difficult to correct), through genotyping problems (usually much easier to spot), to errors in data analysis and display. 
In general the odds of a given genetic data-point being wrong are very low, but they're sufficiently far above zero to warrant caution in making too much out of any single result - mind you, given the extremely small effect sizes of most of the variants currently assayed by personal genomics companies, that's good advice anyway. Certainly it would be a good idea for customers to seek independent validation of any result if they intend to use it to guide serious health or lifestyle decisions.
But the most important piece of advice for personal genomics customers is to engage with your data. Aldhous only detected these anomalies because he was exploring his own genetic data in multiple ways, cross-checking it against both other data and his own (informed) expectations, and was persistent enough to follow up on the strange results he found. 
That's a good example for other personal genomics customers to follow: rather than being a passive recipient of genetic forecasts, dig into your data and see if it makes sense, and keep asking questions until it does. In addition to making it more likely that you'll pick up any errors in your results, you'll also develop a much deeper understanding both of the nature of genetics and of your own genome.

More like this

Personal genomics company 23andMe has revealed that a lab mix-up resulted in as many as 96 customers receiving the wrong data. If you have a 23andMe account you can see the formal announcement of the problem here, and I've pasted the full text at the end of this post. It appears that a single 96-…
Late last week I noted an intriguing offer by personal genomics company deCODEme: customers of rival genome scan provider 23andMe can now upload and analyse their 23andMe data through the deCODEme pipeline.  On the face of it that's a fairly surprising offer. As I noted in my previous post,…
New Scientist has a fascinating piece in which reporters Peter Aldhous and Michael Reilly demonstrate - with a little cash, and more than a little effort - the possibility of obtaining large-scale genetic data from someone without their knowledge or permission. The reporters started with a glass…
Genetic genealogist Blaine Bettinger explores the results of his ancestry testing from 23andMe, and compares it to previous results from a much lower-resolution test. The main message: the hundreds of thousands of genetic markers used by 23andMe (and other personal genomics companies, e.g. deCODEme…

Canary? Forme Fruste? As more customers come online, more problems will happen. We need complete transparency here and less manipulative marketing.....


That's an amazing story, and the situation is going to get far more complex as gene-gene; gene-enviroment; gene-env-gene; etc; interactions are factored into algorithms which feed other algorithms, and then others, to come up with an interpretation of a persons gene and biodata. Tiny bugs in the algorithms or software could cause chaos. Have a look at: Penders et al, A question of style: method, integrity and the meaning of proper science.Endeavour. 2009 Aug 6. PMID: 19665231

Errors in displaying data visualization? This isn't that big a deal. They found the bug and fixed it.

By anomalous (not verified) on 27 Aug 2009 #permalink

Every complex piece of software will contain bugs, because it's written by people and people make mistakes. Just like doctors can make mistakes. As long as these mistakes are admitted and acted upon to prevent these in the future. It's not only the DTC market that needs more transparency, but so does the whole medical world in this respect.