When James Watson’s genome sequence was publicly released earlier this year, Watson famously kept only one region of his DNA a secret – the region encoding the APOE gene, which contains common variants that contribute substantially to the risk of late-onset Alzheimer’s, and also affect predisposition to other diseases.
A recent article in the European Journal of Human Genetics shows something that shouldn’t have come as a surprise to anyone familiar with human genetics: simply removing the APOE gene was not enough to prevent someone from inferring whether or not Watson carries the riskier versions of this gene, because other markers around the gene can also indirectly convey this information through the magic of linkage disequilibrium.
The authors kindly don’t reveal Watson’s APOE status, and in fact note that they warned Watson prior to publishing their paper so that he had time to take appropriate actions. He has since responded by removing an additional 2 million bases around the APOE gene from his public sequence.
That action largely removes the possibility of inferring his risk genotype using linkage – in fact, the authors note with dry Australian understatement that the removal of 2 million bases is “likely excessive”. Watson could have used linkage information from the HapMap project to delineate the smallest required region, but apparently decided that overkill was the best policy.
It’s worth noting that once we have complete genome sequences from sufficient individuals it will be straightforward to determine which DNA positions provide linkage-based information about a particular risk polymorphism (in a specific population, at least). That would allow the clean excision of only those bases that are absolutely required, thus having a smaller impact on research into the rest of the genome. (Of course, that relies on at least some people releasing their APOE sequence into the public domain, even if it turns out to carry the riskier version – I guess it’s lucky for us we have anonymous genome sequencing projects like 1000 Genomes.)
The whole episode must be raising questions in the mind of some of the Personal Genome Project volunteers as they consider the prospect of releasing their own genome sequences to the world (participant number 8 has already raised the prospect of redacting his APOE sequence, while Misha Angrist is reserving the right to hold back, well, anything). Are there genes they should be hiding? If so, how much sequence do they need to delete? Ultimately, how do projects like the PGP reconcile the desire for partial genome privacy with the need to get sequences out there in the public domain to further genomic research?
Mind you, given the quality of the sequence data released so far, they probably don’t need to worry too much for the moment…
Dale R Nyholt, Chang-En Yu, Peter M Visscher (2008). On Jim Watson’s APOE status: genetic information is hard to hide European Journal of Human Genetics DOI: 10.1038/ejhg.2008.198