Moran, Gregory, Give me a Break!

By gregladen on January 16, 2008.

Well, it is a good thing that I have a thick skin and a good sense of humor, or I would be very put off by Larry Moran and probably T. Ryan Gregory as well.

Apparently, I stepped into an ongoing partially ad hominem debate over "Junk DNA" centering on the work of John Mattick and his research group. In this post, I'd like to provide a clarification of my "position" on Junk DNA, and I'll spend a moment admonishing my colleagues for being dorks.

My offending post is here. This is a report on a recent paper by Mattick and others in which they provide evidence that non coding RNA does something ... is not merely the stuff left over when "real genes" are getting transcribed and translated.

This is T. Ryan Gregory's comment, and this is Larry Moran's comment.

In my view, that paper is interesting and the evidence for ncRNA having some function is reasonable. It is clearly stated in the paper (but not emphasized in my post on the paper) that the magnitude of this possible function is very small ... but it might be there.

There is a conception that I believe is generally held by the public, science teachers, interested parties, etc. that the genome can be classed into two categories: DNA sequences that ultimately code for proteins and DNA sequences that are "junk" ... have no function whatsoever.

This is not true to a small extent and at many levels. By "a small extent" I mean that with one exception I'll mention below, most of the DNA base pairs that do not directly relate to coding proteins have not been associated with any function. They may be parts of introns snipped out during coding, or they may be "pseudogenes" ... sequences that may have been genes in the past but no longer function, or they may be other stuff that does not seem to do anything. Most of the non-Protein coding DNA is not known to have a function (again, see exception below).

Some of this DNA has a critically important function. It codes for molecules that are not proteins. One could argue that this does not count as Junk DNA. If one's popular conception of Junk DNA is the DNA that does not code for proteins, then this counts. If not, it does not. But since we are speaking of popular conceptions, which are hard to define and rather slippery, we can certainly consider this type of DNA.

Then there are control regions. If you go back in time not too far, there were parts of the DNA that were known to not be "coding" because they were not part of the "code" for the molecules DNA specifies. Later in time, regions in this land of what would have to be counted as "Junk" were found to be critically important ... they facilitate the coding process and the genetic system cannot do without them. It is possible that one could say that a formal definition of "Junk" would not include anything that is used in this manner, but that is backpedaling, and ignores the problem of popular conceptions being wrong.

A famous geneticist once said:

Some non-coding DNA is proving to be functional, to be sure. Gene regulation, structural maintenance of chromosomes, alternative splicing, etc., all involve sequences other than protein-coding exons. *

Genome size is a function of the total number of genes and the total amount of DNA that is not in genes (Junk, whatever). However, since the typical genome has only a small percentage of its total sequence in actual genes, that part of the equation cannot be very important, and it has been shown that the number of actual genes in a genome is not correlated to genome size.

However, it is also probably true that genome size matters. The metabolic rate of a cell, which is a very important variable, may correlate to genome size, for instance. Since genome size is primarily determined by the amount of "Junk" DNA, and genome size is important, I find it difficult go get out the words "So called Junk DNA is functionless."

This does not seem to be a problem for people who work all the time at the genetic level, like T. Ryan Gregory. T (or shall I call him Ryan) studies genome size, but appears inclined to balk at any suggestion of function in Junk DNA, such as this. I am more of a whole organism guy, so to me, size matters. Body size variation across taxa is patterned in relation to a number of evolutionary and ecological factors, but size is acquired in a number of different ways. The way you get to be small or large is functional, not irrelevant. Size-factors are not "junk."

It is also worth noting that "junk DNA" is linked to "genetic disease." That might not be a "nice" function, but it matters. And it is interesting. To say that this is not a "function" is to have an inexcusably teleological view of "function."

Here is how I would like to rephrase what I said in my original post:

"So called Junk DNA is many things to many people. The role of DNA that does not code for the usual proteins or other important molecules is interesting."

What is also interesting is T Ryan Gregory's and Larry Moran's somewhat vitriolic or at least reactionary reaction to my post. Gregory reacted first, and I posted a comment on his site, and I will follow that up here with what I think is an important statement about the psychology of this issue. Since the discussion was getting interesting, I sent a link to my post and Gregory's post to Moran saying I'd like his take on all of this. His reaction was to post "Greg Laden Gets Suckered by John Mattick," which I will assume was meant in a tongue-in-cheek way, but I'm not sure everyone will see it that way. I'll have to go over to his blog and kick his ass.

Here is what is behind all of this. Intelligent Design Proponents, Creation Scientists, Creationists, etc. have a hard time with Junk DNA. Indeed, truly junk DNA makes no sense in the context of an intelligent designer, and is in fact possibly evidence of a rather dumb designer. Natural selection is the dumb designer. For this reason, there is a link between seeking function in the junk and creationism.

You may have noticed that I am not a creationist.

Another (overlapping) issue is the teleological argument that has been made, that somehow, more DNA is better, and there is some kind of correlation between more, better, and "higher" organism ... even to the extent that, say, dogs and humans are ranked on such as scale even though both are mammals (the human is generally ranked higher). Both Moran and Gregory have written quite a bit about these issues, and I strongly recommend Gregory's post on the Onion Test.

Apparently, Mattick has been linked by Moran and Gregory to this kind of sloppy thinking. I did not know that. I have read Gregory's critiques in the past, but I simply did not link this research team to Gregory's critique to this paper. I was reading the paper as an isolated piece of research.

There are good things and bad things about my having done that. The bad thing is that one should be more familiar with the literature and thus able to look for aspects of the research that are not explicit. I can do that in numerous subfields of science, but not necessarily in this area of genetics. On the other hand, is it not the case that specific pieces of research should be taken at some point on their own, unless you know that a particular research team is likely to falsify data? Are Moran and Gregory acting inappropriately, derailing us all in the pursuit of knowledge because they overreact to this guy Mattick for, essentially, personal reasons, and because they see a creationist behind every implication of function among the junk? Or was I acting inappropriately because I deigned to review an article without actually being, or being a disciple of, Moran and Gregory?

Both. Neither. Scholarship is complex.

More like this

Greg- you're not a geneticist. Your blogging critics, including also RPM of Evolgen, are correct. Most of the junk in the genome really is exactly that- functionless, self-perpetuating crap. And nobody thought even long before Mattick's paper that only protein-coding genes are functional. (Ever heard of ribosomal RNA? tRNA?) The non-protein-coding functional stuff is, however, a really tiny percentage of the genome. (Which obviously does not mean that discovering new bits of it isn't interesting.)

You do remember the first rule of holes, don't you?

There are people interested in understanding the evolutionary forces governing genome size. Yes, metabolic rates and other related factors have been connected to genome size -- recently and quite famously in the dinogenomics paper. However, the most plausible explanation, in my opinion, is one offered by Mike Lynch. Lynch argues that mutations that increase genome size are weakly deleterious. Because selection is most efficient in large populations, weakly deleterious mutations will accumulate in smaller populations. Therefore, taxa with large, complex genomes are the result of selection not removing the mildly deleterious mutations from those genomes because population sizes are too small.

Steve: And in what way is what I said different from what you are saying?

I am a trained biological anthropologist. Genetics is not my focus. I am not a specialist in that area as well. But I certainly do not need to be told what I don't know by those who don't read very carefully!

RPM: Cool. But it is still turtles (adaptation) all the way down, as you point out in your critique of Moran's comment on this idea.

Is it the case or is it not the case that genome size is uncorrelated to cell size and thus to metabolic factors? Birds evolved flight, other taxa with large genomes and large cells did not. How is that not important?

I find it ironic and mildly disturbing, but not surprising, that there is almost a cultish pattern in how evidence is interpreted in this area.

You're still not getting it. Here are some of the problems. 1) It's that it's not "some" of the noncoding stuff not having a function, it's the vast majority of it. 2) It's that the true junk is largely made up of things that are OBVIOUSLY functionless- pseudogenes, selfish transposons of various classes, and the like. 3) It's that this is about getting the science correct, not about some kind of ideological debate as you bizarrely suppose.

Oh, and I think you're completely misconstruing what RPM just said, as well. But I'll let him deal with that.

Douglas Theobald points out:
http://www.talkorigins.org/faqs/comdesc
"Two ciliates, Paramecium aurelia and Paramecium caudatum, are virtually indistinguishable from morphological and phenotypic analysis...However, the first has less than 200,000 kb of DNA in its genome, whereas the genome of the second has nearly 9,000,000 kb of DNA, which is evidently at least 45 times the amount it actually needs...Note also that Paramecium caudatum, a single-celled organism, has about three times the DNA as a human."

It's hard to imagine what necessary regulatory or physiological purpose that extra DNA serves in P. caudatum which are pretty much identical otherwise.

Genomicron's "Onion Test" is right on the mark when it comes to dicussing what is functional versus what is non-functional DNA:
http://genomicron.blogspot.com/2007/04/onion-test.html

As Douglas Theobald mentions:
1% of our DNA is endogenous retroviruses
20% of our DNA is pseudogenes
45% of our DNA is transposons

It really defies logic to try to argue that this material - 66% of the genome - serves some critical purpose and is "therefore not junk".

Perhaps it can be said to have a "function" as raw material upon which mutation and evolution can play, but neither that nor the potential fact that some organisms may rely on the structural value of their genome merits laying out vague claims to a biochemical value.

Mike Dunford (www.scienceblogs.com/authority) also addresses this issue informatively:
http://tinyurl.com/2ont39

The bottom line is that whilst terming these regions "junk DNA" doesn't mean it's categorically non-functional DNA throughout, neither does not knowing what function it may serve automatically imbue it with value. That way lies creationism!

Its rather surreal reading the strawmen ad hominem arguments made against Matticks work if you are familiar with his publications (and what a publication list!). For a start he does not equate genome size to complexity. He equates functional RNA levels to organism complexity - an entirely separate matter. He has been involved with several of the best functional analysis papers of the past few years, for instance the Encode consortium result from last year so I guess there might be a slight chance that he might know what he is talking about on this matter.
Reading his thoughts on the subject I get a distinct image in my head of Larry, sitting on his throne like King Canute, and raging against the incoming tide of functional RNA.

By all means, I encourage people to actually read and think about what I have written on this subject. I appreciate Greg mentioning genome size, a topic about which, as you may know, I have some interest and experience. And nevermind the content of many other posts and papers I have written, the offending post itself clearly says that I have no issue with function for much non-coding DNA. My issue with Mattick does not relate to creationism, though I have come to realize that irresponsible extrapolation will be used by anti-evolution authors. If I have offended Greg (though again, please see what I wrote), then I apologize. But it seems to me that this reaction does not reflect anything I have actually stated.

Reading his thoughts on the subject I get a distinct image in my head of Larry, sitting on his throne like King Canute, and raging against the incoming tide of functional RNA.

Then I suggest you read him more carefully. I also suggest that you read Ryan Gregory's post (http://tinyurl.com/2vqudd) and then explain how 0.7% of the genome constitutes an "incoming tide".

Laden has it exactly correct. The debate arises through equivocation of the term "functional". If "functional" means "confers a positively selectable fitness benefit", then most of the genome is either non-functional or minimally functional. That does not mean, however, that this DNA is inconsequential. For example, 50% of humans with the most severe form of hemophilia have a complete inactivation of the Factor VIII gene resulting from a chromosomal inversion mediated by non-allelic meiotic recombination between a low-copy repeat within the gene and one of two other such low-copy repeats 250ish kilobases telomeric to the Factor VIII gene. This low copy repeat doesn't appear to have any "function", but simply by its existence it has does have consequence. It is an integral part of the description of human genomic structure. So it goes for all the DNA in the genome. Do the proponents of "junk DNA" really believe that if all this DNA was removed, shrinking the human haploid genome down from 3000 Mb to 12 Mb or so, that this would not change the fundamental biology of the human organism? The proponents of non-functional DNA as "junk" are so strident as to strain their credulity.

Do the proponents of "junk DNA" really believe that if all this DNA was removed, shrinking the human haploid genome down from 3000 Mb to 12 Mb or so, that this would not change the fundamental biology of the human organism?

If we're talking about the vast majority of this stuff that is rubbish on its face, like pseudogenes and Alu sequences, then some cross-species genome comparisons would suggest that the answer is "yes". (Mammalian genomes, for example, are very similar in their complement of genes, yet vary in size over a range of about 5-fold- this figure is from Professor Gregory's database, by the way. The mammals with the smallest genomes- many bats, for example- seem to get along just fine, thank you.) As you would expect- taking out the trash doesn't negatively impact the environment in your house, does it?

And I don't think the fact that transposons and other repeated sequences can via recombination occasionally cause bad things to happen, fits any non-Pickwickian definition of "biological function".

There is a difference between nonfunctional and inconsequential, and I have tried to explain this many times. Relationships with cell and organism features are important, but they are not necessarily functional; they could relate primarily to constraints. The issue I discussed in the offending post was extrapolating from small amounts being functional to it all being functional, and I find it surprising that people are offended when I say "neat study, but don't get carried away because it's just a small percentage".

I've noticed that ad hominem is being thrown around quite a bit these days, in this post and in others on the blog-o-sphere.

For example, "Its rather surreal reading the strawmen ad hominem arguments made against Matticks work if you are familiar with his publications", by Sigmund (see above).

As we know, an ad hominem is an attempt to dismiss a particular position or argument by personally attacking the person making the argument. The content of the argument is ignored in favour of insulting or discrediting the person who said it.

So my question is this: are Drs. Moran and Gregory personally insulting those who make arguments about "functional junk-DNA" while ignoring the reason behind the arguments, or, are they attempting to demonstrate that the arguments are faulty, and perhaps in the course of this demonstration, inadvertently (or otherwise) implying something about the character or credibility of those making the argument. If the former is true, it qualifies as an ad hominem. If it's the latter, IMO it's a legitimate debate that perhaps is being taking too personally by one or more sides.

Steve LaBonne: (Mammalian genomes, for example, are very similar in their complement of genes, yet vary in size over a range of about 5-fold- this figure is from Professor Gregory's database, by the way. The mammals with the smallest genomes- many bats, for example- seem to get along just fine, thank you.)

The mammal with the smallest genome in the database (since you bring it up) has a genome 50% the size of the human genome, or haploid content of 1500 Mb. Why don't you try answering the question posed:

Do the proponents of "junk DNA" really believe that if ALL this DNA was removed, shrinking the human haploid genome down from 3000 Mb to 12 Mb or so, that this would not change the fundamental biology of the human organism?

I don't respond to straw men- that's an extreme enough reduction that it might indeed take some genuinely functional stuff with it. What I actually think is right there above to be read. If you actually disagree with any of it, say so. Also please review Prof. Gregory's important distinction between "non-functional" and "inconsequential".

o my question is this: are Drs. Moran and Gregory personally insulting those who make arguments about "functional junk-DNA" while ignoring the reason behind the arguments, or, are they attempting to demonstrate that the arguments are faulty, and perhaps in the course of this demonstration, inadvertently (or otherwise) implying something about the character or credibility of those making the argument.

To my knowledge, I have never personally insulted anyone. I have commented on the claims they have made, which in some cases I have found vastly overstated. I have nothing personal against Mattick, and I find a lot his work interesting -- as I said clearly in the post to which Greg is referring. My objection is to his use of selective data from my area of study (genome size evolution) and his claim that the majority of the genome may be functional given that we have no data to this effect and there is empirical information that suggests otherwise.

I think that we are losing sight of the problem. The whole "debate" about "junk DNA" is bullocks - it's a semantic argument that's a big topic on blogs but bothers absolutely no one that I know of. I guess, people who don't know any better over-interpret that phrase.

Getting back to functionality, there is a huge amount of DNA that regulates gene function (i.e. promoters and enhancers etc.) - and contrary to what has been written we've known about this for over 30 years. According to data (including the ENCODE paper) these regulatory regions probably account for the largest fraction of functional sequence. But these regions are hard to define and so we're not sure what exact sequences fall in this category. The rest of the genome is probably useless in the sense that it could be replaced with a random sequence and the organism wouldn't give a hoot. And yes, buried deep in there are some ncRNAs, but almost everyone in the RNA field (except for a few guys) think that this fraction is small.

Incidentally, I find it immensely surprising to read this:

This is almost completely perpendicular to what I think, so either Greg has not read my work or I am not communicating clearly.

I'm a (non-molecular) biological anthropologist myself, so I just rely on what molecular geneticists tell me about this. But even they haven't reached a consensus on this, have they? Well, carry on, then.

The rest of the genome is probably useless in the sense that it could be replaced with a random sequence and the organism wouldn't give a hoot. And yes, buried deep in there are some ncRNAs, but almost everyone in the RNA field (except for a few guys) think that this fraction is small.

In case I haven't been clear, this neatly summarizes what I've been trying to say as well.

But even they haven't reached a consensus on this, have they?

As an outsider with enough background to understand what's going on (former fly geneticist, now forensic DNA guy who tries to keep up with advances in human genetics) my sense is that they really have, and that the few guys putting ambitious sales talk in the introductions to their papers (see apalazzo's comment)are outliers.

"his claim that the majority of the genome may be functional"
Where exactly has he said this? I know he has said that the majority of the genome may be transcribed - based on microarray results but I haven't seen him claim that all or most of this sequence is carrying out gene or sequence specific functions.
As for Steves question about 0.7% being a tide? I hope he was joking. That is a huge amount of functionality. As an example the total number of microRNAs in the human genome is something like 1000, which works out as something like 0.0001% of the genome yet it would be ridiculous to dismiss the importance of these functional RNAs in terms of regulatory potential. Likewise for many other sequence elements, what is the percentage of the genome that functions as gene promoters? Mattick simply makes the point that there are many more functional elements present within the non protein coding part of the genome that remain to be elucidated. I suspect the final total of functional elements will probably end up as being less than 10% of the genome. I don't know anyone apart from IDiots that assume that all or most of the genome has a specific function.

Alright, I haven't spent much time looking at the primary papers on this issue and reading all the posts/comments on TR Gregory's or Larry Morans' or Laden's blog, but the doesn't mean I can't stir the pot.

So, it seems that the "junk" proponents are making a positive statement. on the lines of ?"this DNA over here has no function, and is therefore junk." Isnt the onus on those saying junk DNA is functionless to actually show this is true? I do not believe those who disagree are necessarily proposing a function for junk DNA, but that you can't rule it out. This follows the absence of evidence is not evidence of absence philosophy. Also, the recent work which is admittedly only a small %, its more "functionality" than existed before. Another thought, in several eukaryotic microbes pseudogenes serve as sites for recombination to generate allelic diversity in actually genes. Similar events could conceptually occur in mammalian cells as well.

Anyway, Ill end this and let the hating continue.

apalazzo: "The rest of the genome is probably useless in the sense that it could be replaced with a random sequence and the organism wouldn't give a hoot."

That's not pure junk then, is it? True junk could be removed altogether with no consequences. Even noncoding DNA that could be replaced by different sequences of noncoding DNA may have structural properties. And those ncRNAs are embedded in noncoding DNA, the spatial arrangement of which may not be arbitrary. So they might be junk in terms of the "typical" function of DNA - that is, coding for RNA - but may not be as structure. I understand that a lot of this is parasitic or detritus, but even those can be appropriated for new functions, even very crude and sloppy ones.

Just playing devil's advocate.

"Consequently, it is possible that much if not most of the human genome may be functional."

Phaesant, M. and J.S. Mattick (2007). Raising the estimate of functional human sequences. Genome Research 17: 1245-1253.

Well, since I have said repeatedly that I don't like the term "junk DNA", that I have no problem with a substantial fraction of the genome having function, that even if most noncoding DNA is functional it still is potentially biologically significant, and that my only objection has to do with overstating the case for function without any data, I think this discussion is mostly about what people other than me have said.

edit: even if most noncoding DNA is nonfunctional it still is potentially biologically significant

TR Gregory: even if most noncoding DNA is nonfunctional it still is potentially biologically significant
I agree with this interpretation. Use of the term "junk" to describe this DNA is not helpful.

"his claim that the majority of the genome may be functional"

Where exactly has he said this? I know he has said that the majority of the genome may be transcribed - based on microarray results but I haven't seen him claim that all or most of this sequence is carrying out gene or sequence specific functions."

------------------------------------------------

"I suggest that we have fundamentally misunderstood the nature of genetic programming of complex organisms for the past 50 years, because of the presumption--largely true in the prokaryotes but not in the complex eukaryotes--that most genetic information is transacted by proteins. This view was derived from studying simple organisms in an analogue age before the power and use of digital information systems were appreciated. However, it now seems increasingly likely that most of the human genome, and those of other complex organisms, encodes a vast and hitherto hidden layer of regulatory RNAs (Mattick and Makunin, 2005Go; Mattick and Makunin, 2006Go). This evolved to breach the operational limits imposed by solely protein-based regulatory systems, in the face of the nonlinear scaling of regulatory requirements as living organisms explored higher organizational and macro-functional complexity (Mattick, 2004Go). Indeed, it may well be that most of the human genome is functional (M. Pheasant and J.S.M., manuscript submitted for publication), including many sequences such as introns and other mobile element-derived sequences that have been long considered as parasitic evolutionary debris rather than the historic raw material for genetic innovation and the current embodiment of higher levels of regulatory sophistication. Thus it appears that the genome is largely composed of sequences encoding components of RNA regulatory networks that co-evolved with a sophisticated protein infrastructure to interact with RNAs and act on their instructions."

Mattick, J.S. (2007) A new paradigm for developmental biology. J. Experimental Biology 210:1526-1547.
doi: 10.1242/jeb.005017

This argument doesn't even touch on the portion of the genome that is coding yet deleterious. That is a different kind of parasite than noncoding junk. Noncoding/nonfunctional/junk DNA is just part of a larger picture of intra-genomic conflict. More Satan's attorney: The organism is a kind of a haphazard yet self-constrained epiphenomenal growth (like a plant gall) that arises from the conflict and cooperation of functional DNA (deleterious and not) and "junk," some of both of which are descended from ancient viral infections.

Lorax asks,

Yes, and they've done it several different ways. There are genetic load arguments that limit the amount of truly functional DNA to a small percentage of the genome. There are studies of mutation rates showing that there are no impediments to fixation of mutations in junk DNA. There are direct experiments where junk DNA has been eliminated from the genome. There are close examinations of sequence regions where we recognize the presence of pseudogenes. There are interspecies comparisons where very closely related species have vastly different genome sizes.

It's a bad idea to believe the critics of junk DNA when they claim that the word "junk" is just a synonym for "ignorance." Such statements really do reveal ignorance but not the kind that they think.

Larry Moran: There are direct experiments where junk DNA has been eliminated from the genome.
These experiments in particular are weak, since they don't delete a very large proportion of the "junk", nor do they follow the results of the deletions made over evolutionarily relevant timescales.

LM: There are interspecies comparisons where very closely related species have vastly different genome sizes.
These observations are also weak, since the species typically being compared both still contain an enormous amount of "junk".

I would put Mattick's argument (Mattick, J.S. (2007) A new paradigm for developmental biology. J. Experimental Biology 210:1526-1547. doi: 10.1242/jeb.005017) in the category of "interesting if true". It's fun to think about. I certainly wouldn't dismiss it out of hand.

So in other words, anonymous, the only evidence that the "junk" is non-functional that you would accept would be to remove all or most of it in a large population and then compare it to a control population over thousands of years? Or am I misunderstanding your criticisms somehow?

Thanks for the cogent reply Larry and I am already in the "not doing a whole hell of a lot" camp for this DNA. However, it also seems to me that a few people have posted extensively on this issue in the past and are thus annoyed if they have to rehash old information. (This is how I feel when I see yet another creationist insisting on the whole story all over again.) However, the confusion over junk DNA/not-junk DNA in the general public is real. This is due almost certainly to the fact that the terminology is vague and subject to different interpretations by qualified scientists and of course everyone considers their viewpoint the correct one and thus fights ensue. Let's say the extensive junk DNA protects the genome from disruption by viral integration. The more junk around the less chance a virus integrates into an important region. I fully accept that this may not be the case, its just for conceptual purposes. Now is that DNA junk or not? If we did TheBlackCat's rather sarcastic approach and deleted most of the junk DNA, no overt phenotypes would be observed in the laboratory setting UNLESS we added an integrating virus. So by many criteria, the DNA is junk. However, if the right assay is done, we would find a reason why extensive junk DNA may yield a selective advantage. However, if junk DNA served a purpose like the protection from virus function we would still have this argument because different people would have a different idea of what "function" meant.

I'm not a geneticist of any description...

My understanding is that for most of the genome, the sequence of bases is irrelevant. Even so, the DNA itself may still be significant in various ways. It may be that the sheer bulk of DNA is important in some cases, or that it is important to have spacers between certain parts of the genome, or that having a bit of extra length somewhere is useful for timing (to slow down some transcriptions, or whatever). It may even be that there it makes some difference roughly what proportions of the various bases there are along some stretch, without much sensitivity to the actual sequence.

One of the things about evolution is that it tends to make "use" of stuff that is lying around. I can well imagine that that some subtle kinds of helpful function is obtained by stuffing around with the raw size of genomes or the spacing and arrangements of things. I'm interested to know what proportion of the genome has some significance to the particular sequence of bases. It may not all be for coding proteins: there are other forms of regulatory sequence where the sequence is important.

When I hear geneticists speak of "junk" DNA, I take this as a kind of code phrase suggesting DNA which has no real significance for the code; so that the bases in that stretch could be shuffled around without making any detectable difference.

My understanding is that most of the genome is "junk" in this sense. Precisely how much is very hard to tell; but there are good reasons for confidence that most it only makes any difference to the organism by reason of simple bulk or spacing out of those parts that do have some sequence significance. I don't think there is any major surprise in discovering new sequence significance in this or that part of the genome, or in finding sequence significance for matters other than coding proteins.

I think we are talking cross purposes here. Reading Matticks papers it is not clear that he means that the majority of sequence of the human genome is functional in the way we would describe a promoter or coding sequence as being functional. His hypothesis seems to be that the majority of the genome is transcribed and that these long transcripts have some effect or function - structural etc. He certainly seems to think that much higher amounts of sequence than has been described to date is directly functional - over 10% and maybe as high as 20% in one paper but that is a separate issue from the point about whether the large amounts of uncharacterized transcripts have some biological effect.

Look, I work in genomics, and there just isn't an answer yet on what's "junk" and what isn't, even for an invariant definition of "junk".

Some non-coding DNA regions are promoters or enhancers; some are transcribed but spliced out, and some of that, plus other transcription never connected to protein expression, results in non-coding RNAs, some of them very small.

Other regions are never involved in transcription at all, but may relate to histone spacing or helping maintain an appropriate GC balance.

The fact is that "junk DNA" is no more useful a term than "design". What's useful is to conjecture on why some genomes are as large as they are, then test the conjectures! Mammalian genome sizes are fairly consistently near three billion bases -- you'd expect more drift, if there were no evolutionary pressures.

Moran's and Gregory's complaints about Mattick seem excessive to me -- because he may be wrong, but his ideas suggest new experiments, and that's good science.

Premature orthodoxy isn't.

Well let's see now. Let's take just one category of stuff that old-fashioned geneticists and biochemists, unlike you slick modern genomicists, would be pretty confident is genuine junk: L1 LINE's and defective elements derived therefrom. That's already over 15% of the human genome right there, no? How open-minded are we really supposed to be about those sequences having a function in the host's biology (not consequences, function)?

Paco says,

Look, I work in genomics, and there just isn't an answer yet on what's "junk" and what isn't, even for an invariant definition of "junk".

Maybe you should find a new line of work if that's what you believe. I can tell you lots of things that aren't junk.

I can also tell you lots of things that really are junk by any reasonable definition. Have you ever heard of pseudogenes? What about all those defective Alu sequences in our genome? Do you really not understand that this is junk DNA?

For the purposes of this discussion, has anyone proposed a definition for "junk" DNA upon which to base the substantial (non-syntax-based) points of contention? For instance: "junk DNA is DNA that can be replaced with a random sequence of equal length without affecting the organism." How much of the genome would be classified as "junk" using this definition?

Most here are more qualified than me to be making definitions, but I think even an arbitrary one would help better define the debate.

Maybe you should find a new line of work if that's what you believe. I can tell you lots of things that aren't junk.

My goodness, Larry Moran has drunk the "Lots of things are not junk" kool-ade being peddled by John Mattick and Immanuel Velikovsky!

For the purposes of this discussion, has anyone proposed a definition for "junk" DNA upon which to base the substantial (non-syntax-based) points of contention?

No. And if you knew anything about genetics, you would know that having definitions of terms that everyone would agree on would SPOIL ALL THE FUN!!!

What have the Romans ever done for us?
Having a clear definition would kill certainly terminate a few of the strawmen arguments being thrown around here.
Steve, the problem with saying that LINEs and ALUs are non functional junk is that we know a certain (albeit low) percentage has been co-opted into clear functional elements (coding sequence, promoters etc). If we were to simply replace all ALU or LINE derived DNA with random sequence of similar GC content it would have real consequences. That is not to say that every repeat derived sequence is 'junk' or non functional, it just means that the results of various genomic surveys such as the ENCODE and CGCC studies are showing evidence that a lot of regulatory regions lie within repeat derived sequence. We simply haven't done enough work to say exactly which ALU is junk and which contains, for instance a sequence that binds a transcription factor and regulates some gene.
Just a quick example TAAGATAATGTAGCCCCTGGCCTCAAA part of LINE element, family L2.
Definitely junk?
It is the lactose tolerance locus on chromosome 2.
So what have the Romans done for us apart from the lactose intolerance locus? I could keep on going for days in this manner, suffice to say there are many important regulatory sequences embedded in repeats and simply dismissing them all like Steve and Larry do is simply showing gross ignorance of the field.
The other point is whether you can simply replace a sequence with a similar length random sequence and then claim that it was non functional DNA. I don't believe this is a valid approach. If it is truly non functional then you should be able to delete it entirely without an effect. Many DNA elements work in conjunction with other elements that are located at some distance away - the DNA loops around to bring the elements in close contact and allow for coordinated function. The 'junk' DNA present in the intervening sequence may function as a spacer that allows for the correct positioning and so deleting it may affect function - just not in a sequence specific manner.

"Albeit low" is the operative phrase there, no? So not even you really think that most of this stuff is functional. The objectionable thing is that this kind of observation gets translated into way over the top hype about "most" of the genome being functional. And where Larry gets pissed off by that is that it sets off his hyper-adaptationism detector. I happen to agree with him that there's a lot of naivete (not universal, but widespread) among genomics types about how evolution works.

Steve: Yes indeed.

This reminds me of the time I casually mentioned to Steve Gould that the new exhibit I was working on at the Museum of Comparative Zoology might include the Coelacanth specimen which was at that time on display in the Romer Room (where we were when I made the remark). He went ballistic, going on and on for a long time about how the Coelocanth is not a living fossil, the concept of "missing link" is bogus, and so on. To ME, as though I was some dumb freshmen in his intro class. My point, of course, was that all the invective spewing obnoxiously and impolitely from his mouth was what we wanted the message of the exhibit to be (heavily edited, of course) with the Coelocanth being one of the physical objects to orient the discussion.

Except Moran and the others are not Steve Gould, obviously.

Oh, and I would not assume that a meaningful or useful understanding of evolution comes with being an experienced geneticist. No, no, I am not talking about YOU (whoever is reading this). I'm speaking in general terms. A trained and experienced geneticist is not preordained to understand evolution just because they are working with some of the stuff of evolution (any more than an astrologer can be assumed to understand modern cosmological sciences). Also, there are many aspects of evolution (such as those surrounding adaptation) that cannot be assumed to be settled, but are assumed in many of these discussions. See RPM's discussion of bird evolution and genome size cited above somewhere. That story (not his version, necessarily, just the whole story) is very muddled by differing conceptions of "adaptation."

Well, the problem is simply that little or nothing worthwhile about evolution is taught to most grad students outside of the evolutionary biology disciplines themselves. That includes me; I just happen to have an interest in evolutionary genetics going (way!) back to my own grad-school days, often pursued at the expense of what I was supposed to be doing. ;)

Steve, we are still talking cross purposes. I am not claiming that every single base pair - or even the majority of base pairs - are critical for some particular function but that one of the features of the genome is that many functional elements are located at defined intervals. Some DNA may be present as simply a spacer to maintain the appropriate distance between functional elements, be they promoters, enhancers, matrix attachment elements etc. I would describe this DNA as 'functional' in the sense that it would have an effect if it was simply deleted - although it could also probably be completely replaced by random sequence with little effect. At the moment it is not possible in a lot of cases to say "this sequence is derived from a LINE or ALU, therefore it is non functional junk". That is too simplistic and ignores the results of genomic analysis of the past decade showing that many critical regulatory sequences have been co-opted from such repeats. Undoubtably most of these repeats will not have a sequence specific function but many more such sequences will be uncovered as we start to analyze populations using the coming 1 million plus SNP chips.

What Larry and I were complaining about is Mattick-style sales talk that "much if not most of the genome may be functional"- and as demonstrated above yes he really did say that. I am reading you as agreeing that such statements are bogus, in which case we don't really disagree.

Steve, the problem comes with the interpretation of that phrase. I don't take it to mean that most of the sequence of the genome is functional, rather that most of the genome contains within it functional sequences, such that you cannot delete large percentages of it without deleterious effects for the reasons I've mentioned earlier. I've been to meetings when Mattick has been giving talks and I really haven't got the impression that he is claiming that most of the sequence is functional, rather that most of it is expressed and that this abundance of transcription has a function.

Now we're into semantics, but I'm not seeing a difference there at all (and the quote from his appear, above, seems quite unambiguous to me). And certainly press coverage of this topic has been interpreting this the same way I do- in a way that's clearly not correct.

To use T. R. Gregory's terms, by the way, we may want to say that all that transcriptional activity might have, in bulk, consequences- but that doesn't establish that individual transcripts of "junk" sequences have functions. Unfortunately it sure looks to me like Mattick is pushing the latter view, but I haven't heard him talk.

"Paper", not "appear", sorry.

Not to confuse things, but you might have a look here

Steve, I wouldn't simply call it semantics. Mattick does explicitly link his definition of functionality with the high level of RNA transcription thoughout the genome. It is probably not the same idea of funtionality that others may have so there is some real possibility for misunderstanding. I wouldn't go as far as he does in his claims but at least it is a testable hypothesis.
Heres the conclusion of the Pheasant and Mattick paper mentioned in the original quote.
Pheasant and Mattick 2007 Genome Research
"It seems clear that 5% is a minimum estimate of the fraction
of the human genome that is functional, and that the true extent
is likely to be significantly greater. If the upper figure of 11.8%
under common purifying selection in mammals from ENCODE
(Margulies et al. 2007) is realistic across the genome as a whole,
and if turnover and positive selection approximately doubles this
figure (Smith et al. 2004), then the functional portion of the
genome may exceed 20%. It is also now clear that the majority of
the mammalian genome is expressed and that many mammalian
genes are accompanied by extensive regulatory regions. Thus,
although admittedly on the basis of as yet limited evidence, it is
quite plausible that many, if not the majority, of the expressed
transcripts are functional and that a major component of genomic
information is rapidly evolving regulatory DNA and RNA.
Consequently, it is possible that much if not most of the human
genome may be functional. This possibility cannot be ruled out
on the available evidence, either from conservation analysis or
from genetic studies (Mattick and Makunin 2006), but does challenge
current conceptions of the extent of functionality of the
human genome and the nature of the genetic programming of
humans and other complex organisms."

Well, that's a hell of a slippery "argument" with a lot of apples-to-oranges progressions to ever more impressive-sounding numbers. I will readily admit that I have no idea what he's really trying to say there. But I think genetic load arguments alone already make it very hard to say with a straight face that 20% or even more of the genome may be "functional" in any sense in which that term is normally understood.
And I would prefer that people use language with the purpose of being understood rather than with the purpose of making their results sound sexier.

Just a quick example TAAGATAATGTAGCCCCTGGCCTCAAA part of LINE element, family L2.
Definitely junk?
It is the lactose tolerance locus on chromosome 2.
So what have the Romans done for us apart from the lactose intolerance locus? I could keep on going for days in this manner, suffice to say there are many important regulatory sequences embedded in repeats and simply dismissing them all like Steve and Larry do is simply showing gross ignorance of the field.

Umm, the element in question has acquired the lactase persistence site long after its insertion. This is not evidence that the LINE element itself is doing anything useful. Sort of like saying that we can thank the Romans for pizza.

It would have been better to pick an example where the inserted element itself directly affected the regulation of transcription. But even then, it's not obvious that insertions themselves are on the whole adaptive (let alone the resultant increase in genome size), rather than a source of random mutation that only sometimes has adaptive consequences.

Larry Chasin (Columbia University) has done some interesting work in this regard showing insertion of highly repetitive DNA sequences is an important driver in the evolution of new exons. This work has some extra credibility in terms of the topic of this page because he wasn't setting out to demonstrate any kind of functionality of highly-repeated genomic elements; that's just the way the result turned out. (which is to say, he doesn't have a rhetorical axe to grind here)

Comparison of multiple vertebrate genomes reveals the birth and evolution of human exons. Proc Natl Acad Sci U S A. 2006 Sep 5;103(36):13427-32. Epub 2006 Aug 28. Zhang XH, Chasin LA.
http://www.pubmedcentral.nih.gov/articlerender.fcgi?pubmedid=16938881

His concluding paragraph:
This work suggests that highly repeated sequences, rather than being parasitic invaders and junk, play an important evolutionary role in the evolution of new genes. The documentation of a number of Alu exonization events led Sorek et al. [Mol. Cell 14, 221-231 (2004)] to propose that exaptation of Alus may have "promoted speciation of the human lineage." Our data support this idea and extend it to additional classes of repeats and to other mammals.

Anonymous - I note that you never did reply to Black Cat's clarification on what you would actually accept as a test of whether 'junk DNA' is functionless:

Perhaps because you realized how silly your position really is?

Moran, Gregory, Give me a Break!

More like this

Last Post

Hacking Voting Machines

On that chilling law suit against the environmental groups

One response to the Las Vegas Shooting

Watch Jeff Merkley Wipe Floor With Trump's William Wehrum

Giant pterosaurs invade London, Summer 2010

Read the first line of this ad out loud. Good job! Now read the next line. . .

Dedication of the 1957 Royal Ice Cream sit-in historical marker