Junk DNA is Junk

The term "Junk DNA" is bullshit. There, I said it. The moment I hear someone utter that phrase, I immediately lose respect for him or her. No one whose opinion is worth anything will refer to non-coding DNA as junk. That's why this article bothers me. The title, "Junk DNA may not be so junky after all" has nothing to do with the content of the article. The research being described is not about showing that non-coding DNA has a function. The function has already been determined. The researchers have used zebrafish to identify enhancers for human genes. The subtitle to the article (Researchers develop new tool to find gene control regions) actually describes the research pretty well. The actual title is a load of crap.

More like this

I just knew it. The second I read this abstract I just knew that the Uncommon Descent cranks would dust off their old "Junk DNA" harangue and suggest that if it wasn't for them, no one would believe that all that non-coding DNA had a purpose. Sal Cordova obliged, and it's the usual embarrassing…
My little screed on junk DNA elicited some good feedback, including a comment from Dan Graur. In a somewhat ill-thought out rant, I implied that anyone who uses the term 'junk DNA' should be ostracized from the scientific community (or something along those lines). I restated my opinion in a far…
I've just returned from Las Vegas after having attended The Amazing Meeting.. Believe it or not, I was even on a panel! While I'm gone, However, my flight was scheduled to arrive very late Sunday night, and I'm still recovering. Consequently, for one more day I'll be reposting some Classic…
I really shouldn't do it. I really shouldn't go perusing the blog of the house organ of the Discovery Institute's propaganda arm, Evolution News & Views, as I did yesterday. I'm not as young as I used to be, have a family history of cardiovascular disease, and am not in the greatest of shape.…

Probably the title was not written by the researchers themselves. Those are normally added by media relations hacks to give the press release more zing.

I actually don't mind the term "junk", but then again I make a distinction between 'junk' and 'garbage'. Whereas junk can be garbage, the two are certainly not synonyms.

Somehow I like the disrespectful touch of "Junk DNA". The idea that your DNA is somehow holy and to be left untouched (or something) with many people is nicely juxtaposed with the lack of a role in the DNA-RNA-protein dogma for most of their genome. Obviously, non-coding regions contribute to the genetic makeup of eukaryotes much beyond a filler role and the term should probably not be used in a scientific context.
But it is a nice example to show that evolution is a playful process that is pretty random and neither nice nor "effective". Kind of a Panda's thumb.

It looks like we're going to have to kick Dan Graur out of the high church of Darwin.

Seriously though, it may look like I've run myself into a pickle. Dan Graur, for those who don't know, is a fairly well known evolutionary biologist. I, of course, did not "immediately lose respect for him" upon hearing that he uses the expression "junk DNA" all of the time. Allow me some poetic license here. It's obvious that Graur knows that junk DNA is not junk (I'd hope), and maybe he is referring to sequences that really have no function. A lot people use junk to refer to all non-coding DNA (ie, this news report and the coverage of Peter Andolfatto's paper), which is what really bugs me. Am I digging myself into an even deeper hole than I began with?

Now, if I were a tenured faculty member and I heard a grad student, post-doc, or someone giving a departmental seminar use that term, I'd jump on them fast . . . alas, I am a lowly grad student who can pull no rank.

Hm. The book I'm reading right now (Molecular Biology made simple and fun) says thay about 20% of our DNA is made up of moderately repetitive sequences which are 'non-functional [and] merely fill up space in the chromosome', while another 10% is made up of highly repetitive sequences that are 'almost all useless as far as is known.' Plus an unspecified proportion of pseudogenes... Is this simply wrong? If not, isn't junk DNA a pretty good term for this? If not, why not?

A lot of repetitive DNA is non-functional, but some may serve structural purposed. My problem is with people who refer to all non-coding DNA as junk (which this press release does). The problem with calling things that you believe to be non-functional "junk" is that you can only disprove non-function (ie, you can't prove non-function). It's best to simple refrain from using the term entirely.

Like Spitshine (and Dan Graur himself) I quite like the irreverent tone of the expression "junk DNA". Remember that creationists get very upset by the idea that there is any non-functional DNA out there. Sure, the "neutralist dream" may have over-reached a bit. My view is that even if it's not totally accurate, it still captures something important about molecular evolution. For example, removing over 2.4 million bases of intergenic DNA from the mouse genome had no detectable effects. So it may not be pure junk, but it's damn close, even if it's only applicable to species with low effective population sizes where purifying selection is ineffective...

I don't care if some microsatellites have a function, they are still junk.

We had one of Dan Graur's colleagues at our IDEA meeting this week use Junk DNA as an evidence for evolution. He's an evolutionary/molecular biologist of sorts, and associate professor.

Dan Morgan at Pandas Thumb has Richard Sternberg's paper:
Roles of Repetitive DNA Sequences with contributions from Johnathan Wells, Paul Nelson, Stanley Salthe, and Todd Wood. Seems us creationist managed to wedge some ideas into a peer-reviewed journal. :-)

There is a website www.noncodingdna.com

Salvador
(Hey, nice weblog, seriously. I may be a creationist, but I recognize a bright mind at work here.)

By Salvador T. Cordova (not verified) on 24 Mar 2006 #permalink

The term "junk DNA" was coined by Susumu Ohno in 1972 in a chapter in an edited book to which very few people have access. He restricted the use of the term to repetitive sequences. Since then, the term has evolved to mean any sequence within the genome that has no function. Incidentally, functional sequences come in three falvors: (1) transcribed and translated, (2) transcribed but not translated, and (3) untranscribed. Everything else is junk. The meaning of "junk" in genomics is very similar the the meaning of "junk" in everyday life: something that has no function and can be discarded without any consequence on function. There are many lines of evidence that show that the vast majority of eukaryotic DNA is junk. Try, for instance, to explain the fact that amoebas have 200 times more DNA than humans. An interesting experiment is reported in http://www.sciencemag.org/cgi/content/full/304/5677/1590b
In this experiment very long "gene deserts" were deleted and nothing happened.

Finally, it should be noted that "junk DNA" is a scientific theory (sensu Popper) because the theory can be refuted by finding a function for a sequence that has been previously assumed to lack function. The theory "there is no junk in the genome" is a religious statement, because it can never be refuted. It can always be said that one has not looked hard enough for a function.

Actually, the existence of junk is a nice side effect of evolution. It shows that evolution produces imperfections. I don't suppose a perfect intelligent designer would have produced junk.

If you do not like the word "junk," maybe you should use "jack" or something. After all, it is common in the United States to use "gosh," "gee," and "shoot" when one is too delicate and too fragile to use real words.

By Dan Graur (not verified) on 24 Mar 2006 #permalink

Let's call it Pulp DNA!
Even if it would turn out advantageous to have some amount of "cheap filler" DNA in the genome, that DNA could still be called junk.

Creationists often like to bring up "mutational load", but doesn't it apply to junk DNA as well? If all those millions of variable sites would have to be specifically selected for (however mildly), I think we would be in trouble.

Dan Graur may have set the record for the person who garners the most respect to ever comment on my blog. It makes me wonder who else reads this...

Anyway, allow me to say this once more: my problem with the term junk DNA is that it has been hijacked by the popular media to mean any non-coding DNA. It is loaded word whose definition most people aren't clear of. It's nice to see that the origins of the term are in the scientific literature, but I will still refrain from using it. I will refer to DNA as either translated, transcribed and not translated, and untranscribed. I will refer to the untrancribed DNA as either regulatory, structural, or undetermined function. I just don't like how the term junk means different things to different people.

Finally, we now have our first creationist siting at the new evolgen. It's interesting that it's while we're having a discussion about semantics, instead of something of substance. And, yes, I realize that they love to poke their wedge into any conflict in evolutionary biology (no matter how inconsequential.

The one who spoke at our IDEA meeting was Karl Fryxell, who by the way, spoke reverently of Dan.

Knockout experiments as proof that something is "junk" I think is pre-mature.

Knockout experiments can be done not only on DNA, but on developmental pathways. One can do a knockout experiment on one of the develomental pathways of a nematode vulva and then an alternative develomental pathway kicks in to create the vulva. There are two independently successful redundant developmental pathways in the vulva. By way of extension, since a knockout of a one redundant developmental pathway has been shown to not have an effect on the development of the organism, therefore that pathway and the associated information stored in the organism for that pathway are should be, according to knockout logic, be labeled "junk".

I had done scant work on nano-molecular machines, and a hot topic is self-healing, self-assembling, deeply redundant, fault tolerant systems. A key feature is lots of sacrificable, disposable, repetitive, widely distributed information. That seems consistent with the architecture of biological systems: self-healing, self-assembling, deeply redundant, fault tolerant.

Also, the hard drive of a typical computer has numerous copies of the same file. When you save a file what happens is the previous copy is kept and the new version is written elswhere on the physical part of the disk. The physical space on the disk is not removed, but the contents of the older version of the file are preserved. Even our best engineered informatic systems have large regions of repetitive unsued "junk" as that was the most efficient design.

Engineers even design information processing systems, like a Compact Disc by deliberately designing copy errors into systems so as to optimize space. All that to say, that what a biologist may view as junk or poor design, a engineer might view as ingenious and optimal given a particular cybernetic context.

I guess there is something after all to the Salem Hypothsis about engineers being creationists.

regards,
Salvador

By Salvador T. Cordova (not verified) on 25 Mar 2006 #permalink

Engineers even design information processing systems, like a Compact Disc by deliberately designing copy errors into systems so as to optimize space.

Care to elaborate?

OK, and this image may come from Scientific American, but it looks like it's simply wrong.

http://www.noncodingdna.com/sciam.htm

They say "assumptions can be dangerous", and then they dig out the oldest assumption of all - the "March of progress", framed in amounts of non-coding DNA?

OK, what about the salamander with 20 times more DNA than man? The amoeba Dan Graur mentioned with 200 times more DNA? The pufferfish with much less junk than other vertebrates?

I admit I don't know what percentage of their genome is non-coding, but it still looks like that bar chart got pulled out of someone's ass.

Windy asked. "Care to elaborate?"

That's a good question. In your compact audio CD, the read and write process is permeated by errors. That is the bits and bytes are encoded onto the CD with a very high error rate (somewhat like a careless typist transcribing data).

The question is, "why don't we engineer more accurate read and write heads?" The answer is that the most efficient storage mechanism in terms of size is to allow write heads to write onto the smallest space possible up to a well defined rate of errors according to Shannon's capacity limit. The errors are then corrected out using Reed-Solomon coding schemes.

That is to say we deliberately say something like, "let's design the head to have X% of errors". We then devise an error correction scheme to filter out the errors. The cybernetic context where the teleological goal is for space conservation results in a particular architecture where errors in read/write are abundant.

A complaint against ID has been if biology were intelligently designed why is there so much error correction going on, why not get it right in the first place? That's an understandable complaint, but information scientists will still quickly recognize certain familiar architectures in biology as he sees in modern information systems.

The impression of design is hard to dismiss, especially where these architectures have these subtle features that most people would see as bad design, but engineers would recognize a brilliant design.

By Salvador T. Cordova (not verified) on 25 Mar 2006 #permalink

Let me add, Mp3's and JPEG files are lossy compression algorithms where 90% of the orignal content is stripped out of the original data representation.

Thus when we see clipped pieces of DNA, that look like an imperfect copy, I would not be too quick to say it's an error. It may well be a form of lossy compression of information that allows for reconstruction at a later time.

I would not be surprised if we discover decompression algorithms in the processing of DNA which can take a snip of incomplete DNA and essentially rescontruct a suitable approximation of a functional protein or something else from that snip. Barry Halls experiments on adaptive evolution suggested it to me. I think there is a lot we don't know, but the point being, what may appear to be an error or a defective copy might not be the most effective way to describe what's going on.

There are cleary things that go wrong in an organism, but I would encourage a second look at things we were so sure were mistakes or "junk".

By Salvador T. Cordova (not verified) on 25 Mar 2006 #permalink

The best explanation I've heard for the amount of non-coding DNA has to do with effective population size. Purifying selection should remove DNA that is not needed (to save energy on replication), but this type of selection is not very strong. This weak selection will work best in larger populations (ie, Ohta's nearly neutral theory). It has even been argued that reduced population size may be responsible for the unique gene structure (exons, introns, etc) in eukaryotes.

I'm pleased to see that Salvador Cordova is interested in the robustness of biological systems. But, contrary to what he suggests, biologists have also been interested in it for a long time, under a variety of names like canalization, homeostasis, stability, etc. Indeed, the study of the evolution of biological robustness is currently undergoing something of a revival (to which I have contributed in a modest way). It may not surprise you that what we have found does not provided any comfort for intelligent design creationists. For one of my favorite pieces of recent work on robustness check out Andreas Wagner's paper on the evolution of coupled circadian oscillators (reviewed here and here). Anyone interested in how one might go about testing the hypothesis of irreducible complexity of a system should read that paper.

Richard,

Congratulations on having your paper published in Nature.

Regarding the issue of redundant functionality, it does pose a problem for natural selection as a mechanism for it's evolution. Since there are frequently circumstances under which the functions may not visible to selection over several generations, this poses a problem for natural selection. I would not say insurmountable, yet, but substantial.

In the case of the nematode vulva, without some co-option, the independent path way can not be selectively advantaged unless the other develpmental pathway is knocked out. The independent pathway, would have to be pretty much functional when it appears as it is critical to perpetuation.

Further, if mutations inside a reduandant systems are not being periodically selected against, the redundant system will eventually disappear as the mutations might so scramble the redundant system, it's gone by the time a selctive context appears where the redundancy would increase fitness.

Perhaps a worthwhile expermental exploration would be to knock out large amounts of redundant function in a creature, so much so that metabolic advantage is conferred. It would be interesting to see if the creature population lacking redundancy overtakes the more robustly architected creatures. That would be supportive of the difficulty in evolving redundancy.

Regarding the paper pertaining to circadian oscillators,
"Can these results be generalized to other systems? It's impossible to tell at this stage." The answer is likely no.

We know from mathematics that evolutionary algorithms can only solve a small fraction of design architectures. That is a given fact in engineering. Some of those architectures which evolutionary algorithms can not solve are already in evidence in biology, such as the turing machine or anything dealing with large scale software such as seen in the cell.

I think the search for mindless evolutionary processes is like trying to create perpetual motion machines. The research is still valuable, but I don't think descriptions of the evolution in terms of blind watchmaker solutions will be forth coming if indeed evolution never happened.

Further research is always in order, but from an engineering and mathematical perspective, I think the chances of finding naturalistic pathways are impossible, and that was by design.

By Salvador T. Cordova (not verified) on 27 Mar 2006 #permalink

Your post raises too many points to address here -- I might have to move the discussion over to my blog... But meanwhile, I'll pick up on a point you've raised a couple of times during this discussion:

One can do a knockout experiment on one of the develomental pathways of a nematode vulva and then an alternative develomental pathway kicks in to create the vulva. There are two independently successful redundant developmental pathways in the vulva. [...] In the case of the nematode vulva, without some co-option, the independent path way can not be selectively advantaged unless the other develpmental pathway is knocked out. The independent pathway, would have to be pretty much functional when it appears as it is critical to perpetuation.

I'm actually a C. elegans biologist, and so I know a bit about vulval development, although I've never worked on it myself. The problem is that I have no idea of what you're talking about here. The fates of the six vulval precursor cells are determined by the action of two signaling pathways: the EGF receptor/RAS/RAF/MAPK inductive signaling pathway specifies the primary fate, and the LIN-12/Notch lateral signaling pathway specifies the secondary fate. However, these pathways are not redundant.

The redundancy you may have heard of comes in the synMuv (synthetic multivulva) genes. These form two functionally redundant classes, A and B (to which a third, C, has recently been added), of regulators of RAS signalling. (The name comes from the observation that mutants from within a single class show normal vulval development, but double mutants affecting loci in different classes show several ectopic vulvae, a phenotype known as "multivulva".) However, the synMuv genes do not constitute "independently successful redundant developmental pathways". In other words, the speculations you tagged on to this example don't make any sense.

I am always a bit befuddled to see the genome implicitly or explicitly analogized to some human contrivance as if the analogy indicates 'design' in the genome...
I call it the argument via analogy. Creationist types use it extensively.