This blog doesn’t seem to want to write itself. I’ve got a few posts in the pipeline (including the next on detecting natural selection), but I can’t seem to finish them. I’m in this writing funk where I start to lay some words onto paper (well, text editor, actually), and then I can’t organize all my thoughts or just can’t finish writing the post (do I have an undiagnosed case of ADD?).
Luckily for me, I have Chad at Uncertain Principles to inspire me, as he’s already done once before. This time he’s asking people about their least favorite misconception in their field. One commentor brings up the misconception that evolution is random — something I’ve blogged on before. I prefer to think of evolution as a combination of stochastic (mutation and drift) and deterministic (selection) processes. The stochastic aspects consist of random draws from different probability distributions. Random means something different to a statistician than it does to a layman — just like theory has a very specific meaning in science (it’s not just a guess) — and therein lies the misconception regarding the random nature of evolution.
Seeing as how I’ve already written about randomness, I won’t dwell on it as my favorite misconception. Rather, I’ll focus on something that bothers the hell out of me: the idea that by sequencing a genome, we are “decoding” it (as you can see, I did not have a very hard time locating articles in the popular press or from official news releases that used that wording).
Genome sequencing consists of determining the primary structure (ie, linear sequence) of a few long molecules (chromosomes) found in the nuclei of cells. Decoding implies translating from one language to another or determining some hidden meaning. While we do have symbols for the four nucleotides found in DNA molecules (A, T, C, and G), this is not what we mean by the “genetic code“. And even if we identify all hypothetical protein coding genes in a genome (thereby mapping all of the genetic codes), have we really decoded it? I’d argue that we have not — I’m not saying that this is an arbitrary procedure, but it’s one thing to predict genes, and an entirely different thing to determine function.
Therein lies the problem; decoding a genome is a long and tedious (possibly endless) process. To decode a genome, we would have to figure out the function for every gene product and how those gene products interact. And even if we simplify decoding to merely identify genes, there is more to a gene that its protein coding sequence. We still have very few algorithms to identify cis regulatory regions (and the ones we do have are nowhere near where they need to be for de novo identification). The amount of transcriptional regulatory regions that are characterized by a genome sequencing project pales in comparison to the total number in a genome. Not to mention that there are structural elements that receive very little treatment in the first few runs of analysis.
So, let’s abandon the idea of “decoding a genome” and refer to the process for what it is: sequencing euchromatin and preliminary analysis of protein coding sequences. That’s my least favorite misconception (number two is the idea that Drosophila are fruit flies), what’s yours?