Why I'm releasing my genetic data online [Genetic Future]

i-7211f30e596305e81407f6284f71620c-gnz_logo.jpg

Back in June I launched a new blog, Genomes Unzipped, together with a group of colleagues and friends with expertise in various areas of genetics. At the time I made a rather cryptic comment about "planning much bigger things for the site over the next few months".
Today I announced what I meant by that: from today, all of the 12 members of Genomes Unzipped - including my wife and I - will be releasing their own results from a variety of genetic tests, online, for anyone to access. Initially those results consist of data from one company (23andMe) for all 12 members; deCODEme for one member; and Counsyl for two of us (my wife and I). As the project proceeds, we plan to obtain and release the results from a far wider range of genetic tests, up to and including complete genome sequences.
In all, the group is currently releasing over 7 million pieces of genetic data mined from our own genomes. Anyone can download the data in raw form, or view it on a custom browser that two of the group assembled using the open-source JBrowse software. Already the data is being used: blogger Dienekes yesterday published an analysis of our ancestry using his own program, EURO-DNA-CALC.
We have plenty more planned over the next few weeks, including discussion of the ethical issues associated with releasing data publicly, especially given the potential impact on family members. We'll also be presenting analyses of our own data: many of us are active researchers in genetics, and relish the opportunity to apply our research tools to our own genomes. We'll be releasing software code allowing others to run the same analyses on their own data.
So, why on Earth are we doing this?

I summarised some of the key motivations for members of the group in my Unzipped announcement post:

  • we want to share the results of scientific analysis of our own genomes, and as proponents of open data access most of us believe that doing good science means releasing complete data for others to investigate;
     
  • we hope that releasing our data publicly will help to guide useful discussions about genetic privacy and the benefits, risks and limitations of genetic information in general;
     
  • many of us believe that the ideal resource for genetic research is large open-access, non-anonymous research databases such as the Personal Genome Project, and that sharing linked genetic and trait information openly with the wider community is a public good - and we hope that our own experiences will encourage others to participate in open research projects;
     
  • we all believe that many of the fears expressed about the dangers of genetic information are exaggerated, and see this project as an opportunity to have a constructive public discussion about the truth behind these fears;
     
  • given the ease with which a dedicated snoop could obtain genetic information surreptitiously (via shed skin, hair or saliva, for instance), some of us argue that the whole notion of genetic privacy is illusory anyway - while releasing our data online makes it easier for people to get hold of it, this is a difference of degree rather than kind.
I wanted to spend a bit of time here expanding on that third point, as this is probably my own primary motivation for engaging in the project.
Any researcher working in genetics or genomics will be all too familiar with the cumbersome bureaucratic obstacles associated with subject privacy and anonymity. Under the traditional research model subject anonymity and data privacy must be protected fiercely, and that leads to substantial hurdles in two key areas: firstly, data sharing between researchers is hindered by the need to ensure that data privacy is maintained; and secondly, layers of protection on subject anonymity mean it is extremely difficult to return research results to participants, even when those results might have health implications.
This is not to say that huge advances in data access have not been made over the last decade, particularly in the field of genomics. Both individual researchers and funding bodies (notably the Wellcome Trust and NIH) have done a commendable job of ensuring that many large genomics data-sets are made available to other researchers through large databases and data access agreements. 
However, can we go further? Researchers such as George Church advocate a bold alternative model: recruit research participants who are willing to share their data completely openly with the world. Find large enough numbers of people willing to sacrifice their privacy for public good, and you suddenly have an amazingly powerful resource: a data-set that can be analysed by any researcher in the world with access to the internet, including participants who can play an active role in the research process.
It can't be emphasised enough just how powerful such a resource would be. Right now, virtually all human genetic and medical data is effectively locked away behind tight consent agreements. That means a given data-set only has a certain number of eyes passing over it, with a restricted circle of expertise; one cohort's data might contain valuable insights into the mechanisms by which cholesterol affects heart disease, but if the researchers holding the keys are eye specialists those will probably never be uncovered.
Science moves fastest when people from diverse backgrounds are allowed access to rich data-sets. The closer we hew to the traditional model of tightly restricted access to human data, the slower we will uncover the associations we need to move into the era of personalised, evidence-based healthcare.
Are there enough people in the world willing to forego their privacy in the name of science? That remains to be seen, but flagship studies like the Personal Genome Project - which seeks to recruit 100,000 volunteers willing to share their genomes and clinical data with the world - are already suggesting that this number is far higher than many would have expected. However, visceral opposition to the idea of releasing such information - based often on an exaggerated sense of the power of genetic data, or its potential for abuse - continue to hold sway over the vast majority of the public.
We're under no illusions here: the data from the 12 of us in Genomes Unzipped aren't in and of themselves of tremendous scientific value. However, if we can get people starting to think about the genuine public good that can be achieved by sharing their data with science, and to weigh that good against a realistic sense of the potential harms, then the project has been a success.
Edited 13/10/2010 to clarify that major progress has been made in data-sharing agreements over the last decade, especially in genomics - I apologise to anyone who interpreted my views as minimising the work that has been done in this area.
Categories

More like this