Junk DNA Redux

My little screed on junk DNA elicited some good feedback, including a comment from Dan Graur. In a somewhat ill-thought out rant, I implied that anyone who uses the term 'junk DNA' should be ostracized from the scientific community (or something along those lines). I restated my opinion in a far more diplomatic manner in the discussion that followed in the comments: junk DNA is an appropriate term for DNA that serves no function (non-transcribed, non-regulatory, and non-structural), but we should refrain from using that term for all non-coding DNA. I elaborate my opinion and reference a tangentially related finding below the fold.

One problem I have with describing DNA as non-functional is that it is a null hypothesis; we can disprove non-function, but we cannot prove that a particular sequence is non-functional. As Dr. Graur mentioned in the comments:

Finally, it should be noted that "junk DNA" is a scientific theory (sensu Popper) because the theory can be refuted by finding a function for a sequence that has been previously assumed to lack function. The theory "there is no junk in the genome" is a religious statement, because it can never be refuted. It can always be said that one has not looked hard enough for a function.

We can think of junk DNA like we do the neutral theory. The neutral theory is used as the null hypothesis in tests for detecting natural selection. Junk DNA could be used in a similar manner in detecting functional untranscribed sequences. However, I'm not entirely sure that transcription is a great predictor of function. Not all transcribed RNAs are translated. The ones that are untranslated may be rRNAs, tRNA, regulatory RNAs, and enzymatic RNAs, but others may just be transcriptional 'mistakes'.

I don't mind if someone uses the term junk DNA so long as it is clear what is meant. As I pointed out in the comments, the press surrounding the paper in question (including this article from Seed) implies that geneticists think that all non-coding DNA is junk. Junk DNA would only represent a subset of all non-coding DNA in the genome. It's safe to assume that most microsatellites are junk. Incomplete pseudogenes are probably junk, although some new work suggests that some may be functional. Unless you are fairly confident that a sequence is non-functional, you should refrain from referring to it as junk.

As for the tangentially related research that I mentioned above, David Begun and colleagues have identified novel genes in Drosophila genomes. They were interested accessory gland protein genes (Acp's; proteins associated with seminal fluids). Previous research has shown that these genes evolve rapidly and are under positive selection. In searching for Acp's in multiple genomes, they identified genes that were unique to some species. From the abstract:

A genomic analysis of previously unknown genes isolated from cDNA libraries of these species revealed several cases of genes present in one or both species, yet absent from ingroup and outgroup species. We found no evidence that these novel genes are attributable primarily to duplication and divergence, which suggests the possibility that Acp's or other genes coding for small proteins may originate from ancestrally noncoding DNA.

New genes have appeared in previously non-coding sequences. Would you characterize these non-coding regions as junk? It seems that one genome's trash is another genome's treasure.

More like this

"I'm not entirely sure that transcription is a great predictor of function"

I think this is a good point. We have this molecularbiologydogma so much into our heads that sometimes it prevents us from seeing that there might be other possibilities besides transcription.

Maybe I wasn't entirely clear. I am saying that transcribed sequences may be non-functional, not that untranscribed sequences are function (which they often are). My point was that even if we show that something is transcribed does not mean that it is functional. Junk may be transcribed because it's near some other junk that happens to look like a cis regulatory sequence.