There's many a slip 'twixt spit and SNP: errors in personal genomics data

By dgmacarthur on August 27, 2009.

Peter Aldhous has a great piece of detective work in New Scientist, which has revealed a bizarre and sporadic glitch in the online software provided by personal genomics company deCODEme to allow customers to view their genetic data.

The glitch appears to be restricted to the display of data from the mitochondrial genome (a piece of DNA with a special fascination for genetic genealogists, since it is inherited almost exclusively along the maternal line). On several separate occasions the deCODEme browser presented Aldhous with a mitochondrial profile that was spectacularly wrong, differing from the profile in his raw data at 44 out of 93 positions.

Aldhous was kind enough to email me the raw data and some screenshots to illustrate the problem. It's clear that the error wasn't the result of Aldhous being presented with someone else's data: the profile is unlike any ever seen in a human being (genetic genealogist and blogger Blaine Bettinger is quoted in Aldhous' article asking whether it was certain that the sequence was from Homo sapiens). Nor is it the result of inaccuracies in the raw data - Aldhous' profile from deCODEme competitor 23andMe agreed with his raw deCODEme data at every site called by both companies.

Instead, it appears as though some problem in the code that translates a customer's raw data into the viewable format of the browser was doing something very strange: calling Aldhous' genotype at each mitochondrial position seemingly at random (44 errors out of 93 sites is compatible with pure chance). Even more bizarrely, whenever Aldhous saw the incorrect profile, it was always the same incorrect profile - on other occasions the browser presented his genotypes completely accurately.

Aldhous says in his article that deCODE is "still investigating" the source of the bug, but I understand that following the publication of his article the company's programmers have tracked down the source of the error and corrected it.

Lessons for personal genomics customers

Now, it's important to emphasise that this error is actually pretty benign: it's unlikely that it would ever have even been spotted by most customers, and Aldhous goes to great pains to emphasise that it didn't affect the risk profiles generated by deCODEme for various common diseases. It's also worth keeping in mind that the genotyping methods used by personal genomics companies are generally extremely accurate: comparisons between data on the same person generated by 23andMe and deCODEme, for instance, typically show discrepancies at fewer than one in 10,000 sites.

However, this incident serves as a canary in the personal genomics coal-mine - a warning of the challenges that lie ahead for companies in ensuring that massive, complex genetic data-sets are presented accurately to consumers.

It's also a useful reminder to personal genomics consumers to not take their results for granted. The process between spitting into a cup and viewing your genetic results online involves multiple steps where things can go wrong, ranging from errors in sample tracking (the most pernicious and difficult to correct), through genotyping problems (usually much easier to spot), to errors in data analysis and display.

In general the odds of a given genetic data-point being wrong are very low, but they're sufficiently far above zero to warrant caution in making too much out of any single result - mind you, given the extremely small effect sizes of most of the variants currently assayed by personal genomics companies, that's good advice anyway. Certainly it would be a good idea for customers to seek independent validation of any result if they intend to use it to guide serious health or lifestyle decisions.

But the most important piece of advice for personal genomics customers is to engage with your data. Aldhous only detected these anomalies because he was exploring his own genetic data in multiple ways, cross-checking it against both other data and his own (informed) expectations, and was persistent enough to follow up on the strange results he found.

That's a good example for other personal genomics customers to follow: rather than being a passive recipient of genetic forecasts, dig into your data and see if it makes sense, and keep asking questions until it does. In addition to making it more likely that you'll pick up any errors in your results, you'll also develop a much deeper understanding both of the nature of genetics and of your own genome.

Subscribe to Genetic Future.

Follow Daniel on Twitter.

More like this

Canary? Forme Fruste? As more customers come online, more problems will happen. We need complete transparency here and less manipulative marketing.....

-Steve

That's an amazing story, and the situation is going to get far more complex as gene-gene; gene-enviroment; gene-env-gene; etc; interactions are factored into algorithms which feed other algorithms, and then others, to come up with an interpretation of a persons gene and biodata. Tiny bugs in the algorithms or software could cause chaos. Have a look at: Penders et al, A question of style: method, integrity and the meaning of proper science.Endeavour. 2009 Aug 6. PMID: 19665231

Errors in displaying data visualization? This isn't that big a deal. They found the bug and fixed it.

Every complex piece of software will contain bugs, because it's written by people and people make mistakes. Just like doctors can make mistakes. As long as these mistakes are admitted and acted upon to prevent these in the future. It's not only the DTC market that needs more transparency, but so does the whole medical world in this respect.

Former software engineer at decode with good Java skills looking for a job.
Contact huldaeggerts@gmail.com.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…