Genome Web's Daily Scan noted an interesting blog post today from John D. Halamka, one of the people to get his genome sequenced through the personal genome project.
I was interested to see his post since Genome Web wrote that he was discussing data standards and we have been writing quite a bit, ourselves, about data measurements for Next Gen sequencing (e.g. Next Gen-Omics) on our company blog, FinchTalk.
But Halamka didn't write about standards for data.
He wrote about standards for metadata, like family histories, and the things that are done with data after it's been collected.
All of those issues are important, but as you can see from my drawing, the regulations that Halamka describes only cover part of the picture. How will you know that your genome sequence data are correct? With several different platforms for Next Gen sequencing, all measuring information in different ways, it may be some time before the real data standards emerge.
That's because standards for raw data being similar/identical to a gold standard--what is known as analytical validity--generally exist as part of being a CLIA-licensed lab. If genomic data is eventually going to be used for explicit medical or diagnostic purposes, labs producing them will have to be licensed through CLIA.
Standards for clinical validity--how well analytically valid data predict a particular outcome--are much softer and often change from condition to condition. The CDC only has clinical validity reports on five conditions, and those are generally limited to mutations known to be involved in familial forms of disease (but not sporadic forms of e.g. breast/ovarian cancer and colorectal cancer).