Chad Is My Muse

This blog doesn't seem to want to write itself. I've got a few posts in the pipeline (including the next on detecting natural selection), but I can't seem to finish them. I'm in this writing funk where I start to lay some words onto paper (well, text editor, actually), and then I can't organize all my thoughts or just can't finish writing the post (do I have an undiagnosed case of ADD?).

Luckily for me, I have Chad at Uncertain Principles to inspire me, as he's already done once before. This time he's asking people about their least favorite misconception in their field. One commentor brings up the misconception that evolution is random -- something I've blogged on before. I prefer to think of evolution as a combination of stochastic (mutation and drift) and deterministic (selection) processes. The stochastic aspects consist of random draws from different probability distributions. Random means something different to a statistician than it does to a layman -- just like theory has a very specific meaning in science (it's not just a guess) -- and therein lies the misconception regarding the random nature of evolution.

Seeing as how I've already written about randomness, I won't dwell on it as my favorite misconception. Rather, I'll focus on something that bothers the hell out of me: the idea that by sequencing a genome, we are "decoding" it (as you can see, I did not have a very hard time locating articles in the popular press or from official news releases that used that wording).

Genome sequencing consists of determining the primary structure (ie, linear sequence) of a few long molecules (chromosomes) found in the nuclei of cells. Decoding implies translating from one language to another or determining some hidden meaning. While we do have symbols for the four nucleotides found in DNA molecules (A, T, C, and G), this is not what we mean by the "genetic code". And even if we identify all hypothetical protein coding genes in a genome (thereby mapping all of the genetic codes), have we really decoded it? I'd argue that we have not -- I'm not saying that this is an arbitrary procedure, but it's one thing to predict genes, and an entirely different thing to determine function.

Therein lies the problem; decoding a genome is a long and tedious (possibly endless) process. To decode a genome, we would have to figure out the function for every gene product and how those gene products interact. And even if we simplify decoding to merely identify genes, there is more to a gene that its protein coding sequence. We still have very few algorithms to identify cis regulatory regions (and the ones we do have are nowhere near where they need to be for de novo identification). The amount of transcriptional regulatory regions that are characterized by a genome sequencing project pales in comparison to the total number in a genome. Not to mention that there are structural elements that receive very little treatment in the first few runs of analysis.

So, let's abandon the idea of "decoding a genome" and refer to the process for what it is: sequencing euchromatin and preliminary analysis of protein coding sequences. That's my least favorite misconception (number two is the idea that Drosophila are fruit flies), what's yours?

More like this

Let's go through the basics again. Cracking the genetic code refers to figuring out how DNA encodes the information to make proteins -- that was done decades ago. Sequencing a genome does not mean that you have decoded the genome; presumably, decoding a genome would mean you've figured out the…
One of the major challenges of the personal genomic era will be knowing exactly which (if any) of the millions of genetic variants present in your genome are likely to actually have an impact on your health. Such predictions are particularly problematic for regulatory variants - genetic changes…
...or how a learned to stop worrying and love evo-devo. As my mind gets a chance to process some of the stuff I heard and talked about at the meeting I just returned from, I'll post some thoughts that will help me organize my ideas (hopefully better organized than that last sentence). This is the…
We miss something important when we just look at the genome as a string of nucleotides with scattered bits that will get translated into proteins — we miss the fact that the genome is a dynamically modified and expressed sequence, with patterns of activity in the living cell that are not readily…

amen. that was my first shock when I first started studying genetics: how was it possible to still "discover" genes for human traits? I mean, hadn't we already sequenced the entire damn genome?