Junk DNA, Revisited

Some bio-bloggers are atwitter over an article by Wojciech Makalowski on Scientific American's website about Junk DNA. I'm a little late to the game because, well, I've been really busy looking at sequences to determine if they are junk DNA. Is it irony? Is it coincidence? Who cares? It's an opportunity to discuss semantics, and I love semantics.

Those of you who have hung around here for a while know this topic often comes up at evolgen (remember this, this, and this . . . hell, here's what a search for Junk DNA turns up). Long story short, I can't stand the term junk DNA, but I do agree with Dan Graur that Junk DNA is a valid null hypothesis. And that's it. The majority of any eukaryotic genome is made up of non-functional sequences that are slightly deleterious, but they persists because selection against these sequences is too weak to purge them (i.e., nothing in evolution makes sense except in the light of population genetics).

The blogorific bruh-ha-ha started with Larry Moran's response, and then Alex Palazzo jumped in. The biggest flaw with Makalowski article, and one that Larry points out, is that Wojciech attempts to answer the question "What is junk DNA, and what is it worth?", but instead spends most of the time describing repetitive DNA. He never actually gets around to answering the question. Makalowski goes on to point out a bunch of examples of repetitive DNA being coopted (exapted?) by genomes to become transcriptional enhancers. At least he avoids saying that transposable elements don't become parts of protein coding genes (see here for why that's not surprising).

Even though some repetitive sequences become functional elements, most of those repeats are just filler. Junk. Not useful, but not bad enough to be worth eliminating. I'm sure there's an appropriate analogy, but I'm not clever enough to coin it. But even though the junk can perform fancy molecular tricks, like induce rearrangements, it is still junk. You can think of those pieces of junk as mutational hotspots. The junk is isolated to regions of the genome where it can do as little harm as possible (the hypothetical analogy I would have introduced were I more clever would have been extended here). That is the evidence for selection on these sequences -- to purge them from near functional regions.

I'm usually annoyed by uses of the term "junk DNA" in the popular literature because they treat the discovery of functional non-protein coding sequences as some big surprise. This time, I'm bothered by a treatment that suggests a function for all junk DNA. I guess I can't win.

More like this

I rather like the distinction between "junk" and "garbage" as originally coined.

The problem is, not knowing the function of a non-protein coding sequence does ot mean the same as knowing that there is no function.

When we have found some non-protein coding sequence to be functional, did we not find it essentially by chance? Without being able to make better estimates of the percentage of other When we have found some non-protein coding sequences to be functional?

Okay, so we know much of the Genome. We know much of the Proteome. We now know a lot of the Metabolome.

How does it all connect? How does it all correlate under a range of conditions that we don't know?

Calling the parts we don't know "Junk DNA" is like calling putative Dark Matter and Dark Energy "Junk Cosmos."

Remember the special issue of nature -- was it their 125th anniversary issue? -- on "The Frontiers of Ignorance?"

That's roughly where we are. We don't know what "Junk DNA" does and doesn't do in given organisms.

We don't know what it is that we don't know.

We don't know how much there is of what we don't know, compared to what we do know.

What if what we don't know is more than we think it is?

That's not just a semantic issue, is it?

The problem is, not knowing the function of a non-protein coding sequence does ot mean the same as knowing that there is no function.

I'll let you in on a well-kept secret. The people who write about junk DNA aren't as stupid as you think we are.

Over the past 30 years we've accumulated a bus-load of evidence that large parts of the mamalian genome are truly junk. This isn't an argument from ignorance as you imply. We leave those sorts of arguments to the creationists.

To take just one example; think of pseudogenes. We're not just guessing that pseudogenes are non-functional, we have data. There are almost as many junk pseudogenes in the human genome as there are functional genes.

Dear Larry,

As a scientist, married to a scientist, I have learned NEVER to call a scientist stupid. Except myself, when I am.

So, I apologize for implying, or encouraging the inference, that any scientists, including you and your colleagues, are less than brilliant. I did not so intend.

I have published over a dozen peer-reviewed papers on computational biology, and thus know how low I am on the ladder. I have coauthored with a Nobel Laureate, and chaired plenary sessions at international conferences where I had the joy of introducing hardworking geniuses of the biomedical community. I'll be doing that again as member of the Executive Committee, with responsibility for all Plenary Sessions, at the 7th International Conference on Complex Systems, hosted by NECSI, Boston, 28 Oct-2 Nov 2007. Can you suggest 3 or 4 of the best potential plenary speakers, by your standrds? Must be: (1) major researchers; (2) widely versed in the "big picture" and their colleagues' work in other fields; (3) extraordinary good speakers as such.

I am also in fierce opposition to Creationists, and their covert minions in Intelligent Design, and I've made many such comments in Science Blogs.

But you may need to explain to a less-technical audience what you mean by: "We're not just guessing that pseudogenes are non-functional, we have data. There are almost as many junk pseudogenes in the human genome as there are functional genes."

I know you're not guessing. But readers may need to have a simplified explanation of scientific methodology at the frontiers of ignorance, which I agree is not guessing.

"There are almost as many junk pseudogenes in the human genome as there are functional genes."

Could we please have some numbers and error bars on (1) junk pseudogenes in the human genome, versus (2) functional genes, with definitions of functional and junk? Pretty please?

Jonathan, now I'm confused. In your first posting you said,

That's roughly where we are. We don't know what "Junk DNA" does and doesn't do in given organisms.

We don't know what it is that we don't know.

We don't know how much there is of what we don't know, compared to what we do know.

Now, call me stupid if you want, but that sounds an awful lot like someone who knows what they're talking about.

Anyone reading that would assume you are speaking with authority on the subject. They would assume that you are insulting all those scientist who work on junk DNA and say it exists.

Did I misunderstand? Did you mean to say you are completely ignorant of the evidence we have for the existence of junk DNA and therefore you assume there isn't any? I need to know the answer before I start explaining the evidence because if you've already made up your mind that my evidence is phony--as your comment implies--then I'm wasting my time.

Dear Larry,

I am ignorant of that. I learned population biology a long time ago (early to mid 1970s). My PhD research was also in that era (1973-1977). My publications in Mathematical Biology and Computational Biology have been sparsely scattered from 1973 throrugh 2007.

I am NOT an expert the way that you are an expert.

I am ignorant of many of the things that you know. Hence I can admit that I don't know, that I barely know the boundaries of what I don't know, but that I am willing to ask aparently naive questions. I am willing to learn.

Please don't be offended. I am very unlikely to accuse you or anyone else who knows the contemporary literature of being "phony."

I really want to know, in whatever level of discourse you choose. If you can educate me, and those who know less than I, and those who know more than I but in other areas, then you are as good a science journalist/blogger as I believe.

Please?

Vos Post's presentation 'The Evolution of Controllability in Enzyme System Dynamics' at the International Conference on Complex Systems 2004 (ICCS2004)is available under
http://necsi.org/events/iccs/openconf/author/papers/213.doc
Either it's a hoax or he is a complete crank. Some excerpts:

the rise of nanotechnology, now funded at several billion dollars per year, in which reverse-engineering and modification of protein systems is now seen as a plausible technology, and not the science fiction it was accused of being when Richard Feynman [Feynman, 1959; 1960] as the great-grandfather of nanotechnology, Post and over a dozen other researchers were grandfathers of nanotechnology, and Drexler (with some early PR assistance by Post in popular magazines such as Omni and Analog) became acknowledged as the father of nanotechnology.

BTW, according to the summary vos Post was at Woodbury University. When I search 'Vos Post' over there I recieved the following reply: Did you mean: vomits post

You might also be interested in vos Post's article 'Adaptation and Coevolution on an Emergent Global Competitive Landscape. A joint theoretical exploration of non-linear dynamical social and economic systems by:
Usha Dasari , Philip V. Fellman , Jonathan Vos Post and Roxana Wright'

http://necsi.org/events/iccs/openconf/author/papers/301.doc
IMO this is all weired, in the appendix he claims that he has somhow invented genetic algorithms:

Appendix I: Internal Structure of the Genetic Algorithm - Jonathan Vos Post

One thing I found, well ahead of Koza and other researchers, through my experiments in 1975-1977, at the University of Massachusetts at Amherst, where I beta-tested John Holland's book "Complexity in Natural and Artificial Systems" by coding the Genetic Algorithm into APL and running evolution of software was as follows:
The evolving software must implicitly determine HOW rugged the fitness landscape is, and adapt its mutation rate, cross-over rate, and inversion rate accordingly.
My insight beyond that notion was to explicitly add genes that set mutation rate, cross-over rate, and inversion rate right at the end of (as it turned out, location didn't much matter) the existing genes. That is, I put meta-variables which coded for parameters of the Genetic Algorithm itself in among the variables coded for the underlying evolving system.
I then ran the Genetic Algorithm with the extended "chromosomes" on landscapes of different ruggedness. As I'd hoped, the 3 types of mutation rates themselves evolved to good rates that fit the rates optimum for adaptation on that degree and style of ruggedness.
This proved, at least to my satisfaction, that the Genetic Algorithm was not only parallel as explicitly obvious, and parallel in its operation on the higher-dimensional superspace of possible gene-sequences, as Holland demonstrated (in his example where a binary string of 0-1 alleles was part of the subspace of trinary strings of 0-1-"don't care" so that spaces of dimension 2-to-the-power-of-N in which evolution occurred were faithfully sampling and implicitly evolving in superspaces of dimension 3-to-the-power-of-N, where N is the bit-length of the evolving "chromosome"), but parallel at yet a higher level, namely that the Genetic Algorithm worked at simultaneously evolving the parameters of its own operation in rugged landscapes while evolving the simulated organisms that had their fitness determined by that same landscape. My analogy was to "hypervariable" genes in the immune system.
With the infinitely greater computational power and speeds available today, this experiment should, in principle, be repeatable in a multi-generational context which should then allow for the testing of emergent order for global evaluation functions. Until now, evolutionary simulations have been based either on the genetic algorithm or similar structures or else using assumptions that finesse the questions of how global evaluation emerges at all and simply proceeds to use global criteria in a purely local environment.

This blog is not about me. If, however, anyone doubts my Caltech credentials, they may check with the Caltech Registrar. If anyone doubts my Feynman coauthorship, they may check with Michelle Feynman, who edited his snailmail collection. If anyone doubts my Woodbury credentials, they are welcome to check with Prof. Christine M. Carmichael, Physics Department, Woodbury University. If they doubt the material of the NECSI (New England Complex Systems Institute) paper, they may check with Prof. Philip V. Fellman, Southern New Hampshire University, and/or with Dr. Yaneer Bar-Yam of the NECSI. I am who I am; I've published what I've published. I am not hiding behind a screen name such as "sparc." I am who I am. Who are you?

Now, may we please get back to what Larry Moran and RPM can teach us?

sparc: I would be delighted to hear from readers about the draft in progress 'What is the Shannon Channel Capacity of Evolution by Natural Selection?'

Science depends upon open discourse, and criticism by peers. The paper in question began because of a rather irritating argument between a former biologist now advocating something akin to Intelligent Design, on one hand, a the blogmaster of one of Seed's Scince Blog, with blog comments flying in all directions. I thought that a good question had been asked, however poorly articulated, and that it was worth trying to answer.

What is that draft, incomplete, paper about? It is about me trying to learn from people on Science Blogs, me trying to contribute to the conversation, and me being willing to put ideas out in public for the sake of feedback.

For the record:

I do have a double B.S. from Caltech, Mathematics and English Literature, 1973 (having started in Physics, and meandered to Astronomy).

I do have a M.S. in Computer and Information Science from the University of Massachusetts at Amherst, earned 1975, technically awarded 1976, specializing in Artificial Intelligence.

I did satisfy all course requirements (and 50+ credits over) for a PhD in that department, and passed the PhD candidacy exam, and wrote a PhD dissertation, arguably the first such in what is now called "Artificial Life" and the first such in what is now called Nanotechnolopgy." That is still listed as an "incomplete" on the transcripts, as 1/3 of the department faculty fled the department for political chaos reasons, my ad hoc thesis committee was never turned into a Formal Thesis Committee, and thus my dissertation was never either accepted nor rejected.

The research did indeed manage to be the first to use the Genetic Algorithm of John Holland to evolve nontrivial working software, the first to use simulated evolution to evolve an equation whose solution was not yet in the literature, the first to match data in the nonlinear dynamics of simulated metabolisms to artificially evolved formulae, which then, as solutions, allowed me to derive them by more conventional means. Many chapters of that dissertation have now been published in refereed venues.

I was an adjunct professor of Astronomy at Cypress College.

I WAS an adjunct professor of Mathematics for 5 semesters at Woodbury University, including at the time that quite a number of papers were presented by myself and my coauthors at ICCS-2004, sponsored by NECSI.

At ICCS-2006 I chaired 3 sessions, as well as presenting several papers.

At ICCS-2007, I'm on the Executive Committee. I think that speaks to my willingness to work with other scientists, and contribute to their dialog with the public.

I would very much like to learn from Larry Moran, RPM, and their readers about Junk DNA.

I would also very much like to better understand how to calculate the Shannon Channel Capacity of Evolution by Natural Selection. I'd like to present that paper at ICCS-2007, if it can be worked into useful form, and a short enough portion extracted to meet the lengthn limits of the proceedings.

I am sometimes smart, someimes stupid. I try to respect everyone in the science and science-writing communities. There are many things I know, many things I've taught, but vastly more that I don't know. I don't entirely know what it is that I don't know.

Not useful, but not bad enough to be worth eliminating. I'm sure there's an appropriate analogy, but I'm not clever enough to coin it.

Navel fluff (belly button lint for Americans). Bit pointless, but it's not actually going be disadvantageous unless there's way too much of it in the wrong place. I'm sure you *could* choke on it if you tried really hard, for example.

Of course, there's a distinction to be drawn between inactive junk DNA (intergenic stuff that just does nothing) and active junk (the various transposons / retroposons / ERVs, etc). The latter class are more analogous to our commensal skin flora - they get on with doing their own thing while we do ours. Periodically we can make use of them (we co-opt repetitive sequence into a new functional gene), and sometimes they become pathogenic (a new retroviral insertion disrupts a vital gene).

Like all analogies, they're only useful up to a point, naturally :-)

By Peter Ellis (not verified) on 16 Feb 2007 #permalink

[by email from PF to JVP]

IMHO, this kind of criticism [sparc] usually comes from people who lack credentials, have failed to gain acceptance of their ideas and often who simply cannot do the math. We've done several versions of the evolution and adaptation paper and they have been very well received at conferences at both the London School of Economics as well as Carnegie Mellon University.

The simulation which we presented at NAACSOS at Notre Dame also grew out of this work. The mathematics may be a bit novel, but they are as Madelbrot would say, "orthodox". The seed work by Windrum and Birchenhall upon which we draw has received wide recognition in European economic and mathematical circles. What I see here is simply an ad hominem attack lacking in both substance and understanding.

The particular point of the appendix about your [Jonathan's] beta testing John Holland's genetic algorithm in the 1970's was to illustrate both the difficulty of characterizing the kernel rate and its underlying mechanism in evolutionary systems and to highlight the fact that while this is a central research concern in evolutionary dynamics, it is precisely the one which is most often "fudged" or assigned some arbitrary value by researchers because of the difficulty of explaining where this critical dimension of the model comes from.

I believe that the above argument is also a significant area of focus in Stuart Kauffman's 2000 book "Investigations", although I also understand that many readers had some difficulty in following his arguments, in no small part because of their subtlety, depth, complexity and in several cases, novelty.

I shouldn't need to flash academic credentials to answer this kind of criticism, nor should you, nor should Kauffman for that matter. This whole line of argument reminds me of the "100 Letters Against Einstein" movement. We've seen this kind of criticism directed against Peter Lynds, we've seen it directed against Kauffman, and occasionally it gets directed at us. It doesn't really matter much to me whether there is one guy flaming us with this kind of rant or 30 (like the Chinese physics students who claimed that not only Peter Lynds didn't exist but that you, I, [your son] Andrew and [coauthor Prof. Christine M. Carmichael] were fictitious as well).

You don't need a whole forum to take down our arguments. If what we are saying is wrong, one clearly stated scientific argument should be enough. I'm still waiting for it. Please feel free to repost this wherever you like.

[Prof. Philip V. Fellman, Southern New Hampshire University]