Personal Genome Project releasing genetic data today

By dgmacarthur on October 20, 2008.

The first 10 participants of the ground-breaking Personal Genome Project (PGP) will be receiving a hefty chunk of data today: the sequence of the protein-coding regions from many of their genes (collectively known as the "exome"). And if all goes according to plan, they'll soon be dumping all of that data on the web for anyone to access.

The PGP is an audacious endeavour led by Harvard's George Church (recently profiled in Wired). The ultimate goal of the Project is to sequence the entire genomes of 100,000 volunteers, and release both genetic and medical data from those volunteers to the research community - and indeed to anyone else who wants to view them. The Project has drawn both acclaim and criticism from the genetics community, with much of the criticism being directed at its unusual concept of genetic privacy - essentially, the Project's leaders argue that the reality of modern genomics means that the concept of patient anonymity no longer applies.

As a first step the PGP has been releasing information from its first 10 volunteers. When it comes to the understanding of genetic information they're an impressively well-informed group, including Church himself, entrepreneur Esther Dyson, linguist Steven Pinker, and academic and blogger Misha Angrist. Public profiles for the PGP 10 - including some fairly sensitive medical information - have been up on the PGP website for some time, but thus far there have been no genetic data attached to the profiles. That's set to change soon, so long as the participants don't suffer a last-minute case of cold feet and decide to keep their information out of the public domain.

Apparently the information being released to the PGP 10 today consists of around 20% of each volunteer's exome, a total of less than 1% of a complete genome sequence - but with the promise of much more to come. Ultimately, the PGP aims to provide complete genome sequences for all of its volunteers, which will become more and more feasible as the cost of DNA sequencing continues to plummet.

For genetic voyeurs, the identities associated with each of the public profiles (which are currently indicated by number alone) have been worked out via some internet sleuthing by Blaine Bettinger. Presumably the genetic data - when it's finally released - will be accessible via the same profiles.

Both MIT Technology Review and the NY Times have articles on the data release that are well worth a read.

There won't be any major medical breakthroughs from analysis of the PGP10 data, but this is a tremendous first step in the direction of personalised medicine. It's also an important experiment to see whether the noble open-access model of the PGP can survive contact with reality. As Church notes in the NY Times article: "We don't yet know the consequences of having one's genome out in the open. But it's worth exploring."

Anyone who's interested in getting their genome sequenced by the PGP - and sharing the resulting information with the world - should consider registering for inclusion in the next phase of the Project.

More like this

Presumably even more people who are able to give highly informed consent and who are in genetic and other biomedical fields can register for this next trial.

Perhaps a strong effort by faculty, postdocs, grad students, and even undergrads - we who study genetics and related fields - to register for this project might help bring some visibility to the project.

It looks like a lot of post-processing will be necessary to make sense of the raw data.

I took a quick peek at the data file for participant PGP1. It has 55000 records in FASTQ format, without any annotations at all, not even the name of the gene. The raw sequence data ranges in length from a few dozen to a few hundred bases (and many records had no usable data at all).

I put the sequence data from the first record through BLAST. It is a perfect match for a zinc finger protein, also found in the chimpanzee, so I guess that particular record doesn't reveal much about its owner :)

Hi Ann,

Great timing - I just posted about the PGP sequence data here.

Assuming that the PGP don't release more processed data shortly, I'll run some alignments and see what I can find - but as I say in my post, the coverage is so low that these data are unlikely to be particularly informative by themselves.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…