"Come quickly, Watson," said Sherlock Holmes, "I've been asked to review a mysterious sequence, whose importance I'm only now beginning to comprehend."
The unidentified stranger handed Holmes a piece of paper inscribed with symbols and said it was a map of unparalleled value.
Holmes gazed thoughtfully at the map, then slowly lifted his eyes and coldly surveyed his subject's beaming countenance. "You have an affinity for the ocean," said Holmes, "that you indulged to excess as a reckless youth. An experience as a medic in the military changed your life and gave you a reason to do more than surf. A community college graduate, you achieved positions of high power and left them just as easily. You have a thirst for recognition and a talent for achieving goals that others thought were many years away."
Dr. Venter looked stunned. "You saw all that in the sequence of my genome?" he gasped.
"No," remarked Holmes, "I read your book."(*)
This is how I imagine a meeting between Watson, Holmes, and Venter might play out. Holmes would then ask, "what can we learn from Venter's genome that we don't already know?"
Watson, might be somewhat perplexed. There was certainly one thing that really puzzled me.
Aren't all human autosomes diploid?
The title of the PLoS article (2) made it seem like there was something unusual about having solved the sequence of a diploid human. This really puzzled me because we have 23 paired sets of chromosomes in addition to our mitochondrial DNA. Except for our mitochondrial DNA, all the chromosomes in women, come in pairs. In men, there are two chromosomes without partners, X and Y, but we don't hold it against them.
Anyway, so I was confused about the emphasis on Venter's genome being diploid.
It seemed by implication that Watson's genome was not or at least that there was something different about it.
It took some sleuthing this morning, but I did figure it out.
Watson's genome was sequenced with the 454 instrument. The beauty of 454 sequencing is that you don't have to clone your DNA in order to sequence it. You sequence single molecules, lots of single molecules, and use assembly algorithms, and I'm guessing a reference genome sequence, to put the sequence together.
How was Venter's genome project different from Watson's?
Understanding how Venter's genome was sequenced and put back together again was a bit challenging because the PLoS paper didn't include anything about the sequencing in the materials and methods section. (I think they were a bit remiss here, but oh well).
In order to figure out how Venter's group actually sequenced the DNA, I had to use the accession numbers from the paper and hunt around the NCBI until I could find some traces (electropherograms) from the project. I also found some annotations that told me that the sequencing was accomplished using the whole genome shotgun approach. (I wrote a whole series on genome sequencing earlier.)
Anyway, I wanted a trace so I could figure out if the Venter group was using any of the next-generation sequencing technologies. Eventually, I found a trace, opened it in FinchTV, and clicked the i icon to learn about the sequencing instrument. FinchTV presents all kinds of information from chromatogram files. It told me that this trace was obtained from an ABI 3730 sequencer and that it was base-called with KB v. 1.2 (in case you're wondering about this after my obsession with base callers).
I went back to the paper and at last, I remembered. The Venter sequence was derived from large, CLONED, pieces of DNA. If they could map the sequences back to the original clones, they could reconstruct the chromosomes. This is what they did.
You can think about the two different procedures like this: in the Watson version, everything was mixed in a blender, the sequence of each little piece determined, and the entire thing reconstructed from itty bitty parts. In the Venter version, the sequence was broken into large parts. Then the large parts were broken into smaller parts, sequenced and put back together. The difference is that with Venter's genome, it was possible to figure out how the smaller parts fit into the larger parts, and to reconstruct contiguous pieces.
Of course the HapMap data helped too, but this the general idea.
Why do we care? Is this going to be on the test?
Ah, yes, the pre-med question.
Unlike Watson's data, Venter's data allows us to look much more closely at the difference between the two sets of chromosomes. The paper reports that the maternal and paternal sets are quite different and 44% of the genes are heterozygous.
Good news, Craig, your parents weren't closely related!
Venter's data provide an amazing glance at the amount of variation between the two sets of chromosomes in one individual human. The differences determine health, personality, looks, all the things that make us distinct individuals.
And that, is why we care.
It's elementary, my dear Watson.
1.The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World by James Shreeve.
2.Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS Biol 5(10): e254 doi:10.1371/journal.pbio.0050254
3. Erika Check, 2007. James Watson's genome sequenced. Published online: 1 June 2007; | doi:10.1038/news070528-10
Curse you Holmes! Foiled again!
I mentioned this on a couple of other posts on this story, sorry if this is redundant, but I wonder if there are plans to do a sequence from the remains of Linnaeus any time soon, considering he's the type Homo sapiens?
Anne-Marie: I think that's a very nice idea with a certain sense of symmetry.
There are three things though that make me think that it's not likely to happen.
First there are technical challenges. It's not easy to get good DNA sequences from dead people. When people are alive, their cells are intact and their DNA is protected and safe within the nucleus. When someone dies, the cell membranes lose the integrity. Stuff leaks from one compartment to another, enzymes start to digest parts of the cells. It's not a pretty site. The DNA in dead cells starts to break down, too. DNAses begin to digest it into smaller pieces, and other chemicals can start to deaminate it and change the sequence. That said, we're pretty good at getting mitochondrial DNA sequences, but that's because there are lots of copies of mitochondrial DNA and it's easier to get more data when there's more starting material to work with. So, getting a good from a dead body is hard.
Second, there's the cost. It's still pretty expensive to sequence a genome as large as the human genome. Watson's genome was paid for by 454 (I think). Venter's institute and his collaborators covered the cost of doing his genome - as far as I know.
Who would pay to sequence Linneaus' genome?
I don't think his genome will be sequenced until sequencing gets cheap enough so that someone will pay for it to be done.
Last, we would learn something about Linneaus' polymorphisms, but we're not likely to really learn much that's new about human genomes. Unless, one can make the case that there's something unique and new to be learned, sequencing his genome isn't likely to happen.
Yes, the sequencing of Watson's genome was paid for by 454. The assembly was done by the HGSC at Baylor College of Medicine (BCM).
It was cheap for a genome (~2 million dollars, if I remember correctly), but still not cheap enough to go around sequencing things willy-nilly.
Thanks for the explanation, that all makes sense. I thought funding was probably a big factor (actually mentioned that but then cut it out of my comment). I had assumed Linnaeus had undergone some kind of preservation when he died, I'm not sure where I got that idea, I thought it must have been in the criteria for him becoming the type. Thanks for the information!
$2 million to sequence a genome!? I don't understand why it is so expensive. Could somebody please enlighten me?
The first genome was predicted to cost $3 billion (about a $1 per base), so $2 million sounds relatively cheap.
I'll put my answer in another post, since there a lot of details to tally.
You can get the read and quality data, plus a link to the traces here:
That's great! thanks!
I answered the question about genome sequencing costs, here.
You can browse Venter's diploid genome and its mapping to NCBI36 at http://huref.jcvi.org.