Why I'm releasing my genetic data online


Back in June I launched a new blog, Genomes Unzipped, together with a group of colleagues and friends with expertise in various areas of genetics. At the time I made a rather cryptic comment about "planning much bigger things for the site over the next few months".
Today I announced what I meant by that: from today, all of the 12 members of Genomes Unzipped - including my wife and I - will be releasing their own results from a variety of genetic tests, online, for anyone to access. Initially those results consist of data from one company (23andMe) for all 12 members; deCODEme for one member; and Counsyl for two of us (my wife and I). As the project proceeds, we plan to obtain and release the results from a far wider range of genetic tests, up to and including complete genome sequences.
In all, the group is currently releasing over 7 million pieces of genetic data mined from our own genomes. Anyone can download the data in raw form, or view it on a custom browser that two of the group assembled using the open-source JBrowse software. Already the data is being used: blogger Dienekes yesterday published an analysis of our ancestry using his own program, EURO-DNA-CALC.
We have plenty more planned over the next few weeks, including discussion of the ethical issues associated with releasing data publicly, especially given the potential impact on family members. We'll also be presenting analyses of our own data: many of us are active researchers in genetics, and relish the opportunity to apply our research tools to our own genomes. We'll be releasing software code allowing others to run the same analyses on their own data.
So, why on Earth are we doing this?

I summarised some of the key motivations for members of the group in my Unzipped announcement post:

  • we want to share the results of scientific analysis of our own genomes, and as proponents of open data access most of us believe that doing good science means releasing complete data for others to investigate;
  • we hope that releasing our data publicly will help to guide useful discussions about genetic privacy and the benefits, risks and limitations of genetic information in general;
  • many of us believe that the ideal resource for genetic research is large open-access, non-anonymous research databases such as the Personal Genome Project, and that sharing linked genetic and trait information openly with the wider community is a public good - and we hope that our own experiences will encourage others to participate in open research projects;
  • we all believe that many of the fears expressed about the dangers of genetic information are exaggerated, and see this project as an opportunity to have a constructive public discussion about the truth behind these fears;
  • given the ease with which a dedicated snoop could obtain genetic information surreptitiously (via shed skin, hair or saliva, for instance), some of us argue that the whole notion of genetic privacy is illusory anyway - while releasing our data online makes it easier for people to get hold of it, this is a difference of degree rather than kind.
I wanted to spend a bit of time here expanding on that third point, as this is probably my own primary motivation for engaging in the project.
Any researcher working in genetics or genomics will be all too familiar with the cumbersome bureaucratic obstacles associated with subject privacy and anonymity. Under the traditional research model subject anonymity and data privacy must be protected fiercely, and that leads to substantial hurdles in two key areas: firstly, data sharing between researchers is hindered by the need to ensure that data privacy is maintained; and secondly, layers of protection on subject anonymity mean it is extremely difficult to return research results to participants, even when those results might have health implications.
This is not to say that huge advances in data access have not been made over the last decade, particularly in the field of genomics. Both individual researchers and funding bodies (notably the Wellcome Trust and NIH) have done a commendable job of ensuring that many large genomics data-sets are made available to other researchers through large databases and data access agreements. 
However, can we go further? Researchers such as George Church advocate a bold alternative model: recruit research participants who are willing to share their data completely openly with the world. Find large enough numbers of people willing to sacrifice their privacy for public good, and you suddenly have an amazingly powerful resource: a data-set that can be analysed by any researcher in the world with access to the internet, including participants who can play an active role in the research process.
It can't be emphasised enough just how powerful such a resource would be. Right now, virtually all human genetic and medical data is effectively locked away behind tight consent agreements. That means a given data-set only has a certain number of eyes passing over it, with a restricted circle of expertise; one cohort's data might contain valuable insights into the mechanisms by which cholesterol affects heart disease, but if the researchers holding the keys are eye specialists those will probably never be uncovered.
Science moves fastest when people from diverse backgrounds are allowed access to rich data-sets. The closer we hew to the traditional model of tightly restricted access to human data, the slower we will uncover the associations we need to move into the era of personalised, evidence-based healthcare.
Are there enough people in the world willing to forego their privacy in the name of science? That remains to be seen, but flagship studies like the Personal Genome Project - which seeks to recruit 100,000 volunteers willing to share their genomes and clinical data with the world - are already suggesting that this number is far higher than many would have expected. However, visceral opposition to the idea of releasing such information - based often on an exaggerated sense of the power of genetic data, or its potential for abuse - continue to hold sway over the vast majority of the public.
We're under no illusions here: the data from the 12 of us in Genomes Unzipped aren't in and of themselves of tremendous scientific value. However, if we can get people starting to think about the genuine public good that can be achieved by sharing their data with science, and to weigh that good against a realistic sense of the potential harms, then the project has been a success.
Edited 13/10/2010 to clarify that major progress has been made in data-sharing agreements over the last decade, especially in genomics - I apologise to anyone who interpreted my views as minimising the work that has been done in this area.

More like this

Back in June I launched a new blog, Genomes Unzipped, together with a group of colleagues and friends with expertise in various areas of genetics. At the time I made a rather cryptic comment about "planning much bigger things for the site over the next few months". Today I announced what I meant…
The first 10 participants of the ground-breaking Personal Genome Project (PGP) will be receiving a hefty chunk of data today: the sequence of the protein-coding regions from many of their genes (collectively known as the "exome"). And if all goes according to plan, they'll soon be dumping all of…
A recent PLoS Genetics paper triggered a sea change in the way genetic data is handled by research institutions like the NIH, the Broad Institute, and the Wellcome Trust. The paper, which came out last month, demonstrated that it's possible to identify a single individual's DNA in a pool of DNA…
I discussed the second-generation sequencing company Complete Genomics a couple of weeks ago (see here and here). These guys are unique in that they offer their technology only as a service, rather than the usual business model of selling platforms to genomics facilities, and a highly restricted…

Let me join the chorus of support. Agree that crowd-sourcing genetic data has tons of promise.

On the other hand...a naysayer for the utility of current genomic tests (who could that be??) might argue that a detailed medical and family history with routine labs can tell as much or more about yourself than any SNP array.

Are you going to release "regular" medical data too, or do you think the utility of SNP data would be eclipsed?

Hi wei,

1. Yes, we do. Our costs are currently covered by a small grant from the PHG Foundation, but we're actively looking for additional funding to cover expenses as the project ramps up.

2. Good question: our long-term goal is to ensure that everything is mapped to the reference forward strand, but for the moment we should come up with some way of indicating strandedness. I'll look into it.

Hi Michael,

As you know, medical and family history (like current genetic data) can only tell you part of the story. Most individuals born with recessive Mendelian diseases have no family history of that disease, for instance. We see all of these sources of information as useful; even for many complex diseases (e.g. breast cancer), common variants from SNP scans are just as predictive by themselves as medical and family history combined.

The project will focus on genetic data initially, simply because that's what we know: we don't yet have a clinician on board (although we'll be looking for volunteers). However, we'll also be discussing the areas where genetic predictions have limited utility - this won't just be an exercise in hyping the value of SNP data.

What I find amazing is the power that the synergy between the globalised communications network and the broad goals of new research at the heart of this endevour. To be able to amass a database of the genetic information of 100,000 people from across the world, available from anywhere is equally amazing as the content of that database.

On a separate note, I am not entirely surprised that support for such a project is larger than anticiapted. If history has shown anything it is that people will put themselves forward for almost anything.

my concern is about the impact (or consent) of publishing your genome and clinical info on your children. Even more when both parents are publishing it. I agree that the fears of publishing genomic information are exagerated, it is just a question that somehow we are publishing also their information.

Great idea - another advantage is that seeing what the raw data looks like may encourage some of us to take the plunge and join a service like 23andme. It is hard to judge from their website what a full dataset looks like and how easy it is to find specific information.

By Gerard Crotty (not verified) on 29 Oct 2010 #permalink

Hi Daniel
I would like to include my 23 and me data if you want it. I already make it available to students and I recently offered access to my account to the audience at the World Congress of Psychiatric Genetics after listening to some eminent colleagues batting on about the dangers of direct to consumer genomics.
No one took me up on the offer.

By Simon Easteal (not verified) on 08 Nov 2010 #permalink

I love this site, it's so interesting to see how the genetic world is developing in this area. My friend and I started a forum after he got his genes tested with 23andme, after we had some long discussion about what he was getting into and how we were worried about the larger implications for the rest of society. Its not on the same scale as this project, but it would be great to meet any like minded people who wanted a chat on the subject, genetic testing forum.