It weren't "junk" after all

There's a new paper in Nature (OPEN ACCESS), Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project:

...First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome....

From Eureka Alert, New findings challenge established views on human genome:

The ENCODE consortium's major findings include the discovery that the majority of DNA in the human genome is transcribed into functional molecules, called RNA, and that these transcripts extensively overlap one another. This broad pattern of transcription challenges the long-standing view that the human genome consists of a relatively small set of discrete genes, along with a vast amount of so-called junk DNA that is not biologically active.

The new data indicate the genome contains very little unused sequences and, in fact, is a complex, interwoven network. In this network, genes are just one of many types of DNA sequences that have a functional impact. "Our perspective of transcription and genes may have to evolve," the researchers state in their Nature paper, noting the network model of the genome "poses some interesting mechanistic questions" that have yet to be answered.

If you read evolgen you know that the term "Junk DNA" is crap. From an evolutionary viewpoint it also seemed a bit peculiar to relegate most of the genome to non-functional status, after all, why was it still around after all this time? Evolution is a noisy process that is predicated on "good enough" local solutions, but it seemed a little bit of a stretch to believe that this is the best that various evolutionary dynamics could come up with. Speaking of which:

Other surprises in the ENCODE data have major implications for our understanding of the evolution of genomes, particularly mammalian genomes. Until recently, researchers had thought that most of the DNA sequences important for biological function would be in areas of the genome most subject to evolutionary constraint - that is, most likely to be conserved as species evolve. However, the ENCODE effort found about half of functional elements in the human genome do not appear to have been obviously constrained during evolution, at least when examined by current methods used by computational biologists.

According to ENCODE researchers, this lack of evolutionary constraint may indicate that many species' genomes contain a pool of functional elements, including RNA transcripts, that provide no specific benefits in terms of survival or reproduction. As this pool turns over during evolutionary time, researchers speculate it may serve as a "warehouse for natural selection" by acting as a source of functional elements unique to each species and of elements that perform the similar functions among species despite having sequences that appear dissimilar.

The old view promoted by R.A. Fisher was that most of the genome (OK, they didn't know about the "genome" then, but you get the picture) would be constrained by selective forces, as new mutants would invariably be deleterious. On occasion a selectively favored mutation would arise that would increase in frequency and quickly "substitute" in place of the previous allele on that locus, resulting in a slow and gradual turnover of the genome. Neutral and nearly neutral theory supplemented or overturned (depending on your perspective and scale of focus) the classical model by positing that mutations with little selective import were responsible for the preponderant number of substitutions at any given locus over evolutionary time. The implication here is that evolutionary change would be roughly proportional to the rate of mutation. My posts on genetic draft add another process to the toolkit of evolutionary dynamics, as the sweeps drive reorganizations of the genome adjacent to the area favored by selection.

Now this finding that much of the functionally relevant genome is not under strong constraint will surely be fruit for many hypotheses. Perhaps selection is more pluralistic than we thought? Or perhaps the long arm of evolution implicitly sweeps across the contingencies of adaptive peaks over the horizon? In any case, my first instinct to infer that Fisher was wrong to assume that one fitness peak dominated the landscape and that only a very precise genetic conformation would yield the optimal phenotype. We know that this seems untrue for human skin color, as multiple alternative genetic events converged upon the same physical outcome.

Update: To clear up some confused prose above, from the paper itself:

Instead, we hypothesize five biological reasons to account for the presence of large amounts of unconstrained functional elements. The first two are particular to certain biological assays in which the elements being measured are connected to but do not coincide with the analysed region. An example of this is the parent transcript of an miRNA, where the current assays detect the exons (some of which are not under evolutionary selection), whereas the intronic miRNA actually harbours the constrained bases. Nevertheless, the transcript sequence provides the critical coupling between the regulated promoter and the miRNA. The sliding of transcription factors (which might bind a specific sequence but then migrate along the DNA) or the processivity of histone modifications across chromatin are more exotic examples of this. A related, second hypothesis is that delocalized behaviours of the genome, such as general chromatin accessibility, may be maintained by some biochemical processes (such as transcription of intergenic regions or specific factor binding) without the requirement for specific sequence elements. These two explanations of both connected components and diffuse components related to, but not coincident with, constrained sequences are particularly relevant for the considerable amount of unannotated and unconstrained transcripts.

The other three hypotheses may be more general--the presence of neutral (or near neutral) biochemical elements, of lineage-specific functional elements, and of functionally conserved but non-orthologous elements. We believe there is a considerable proportion of neutral biochemically active elements that do not confer a selective advantage or disadvantage to the organism. This neutral pool of sequence elements may turn over during evolutionary time, emerging via certain mutations and disappearing by others. The size of the neutral pool would largely be determined by the rate of emergence and extinction through chance events; low information-content elements, such as transcription factor-binding sites110 will have larger neutral pools. Second, from this neutral pool, some elements might occasionally acquire a biological role and so come under evolutionary selection. The acquisition of a new biological role would then create a lineage-specific element. Finally, a neutral element from the general pool could also become a peer of an existing selected functional element and either of the two elements could then be removed by chance. If the older element is removed, the newer element has, in essence, been conserved without using orthologous bases, providing a conserved function in the absence of constrained sequences. For example, a common HNF4A binding site in the human and mouse genomes may not reflect orthologous human and mouse bases, though the presence of an HNF4A site in that region was evolutionarily selected for in both lineages. Note that both the neutral turnover of elements and the 'functional peering' of elements has been suggested for cis-acting regulatory elements in Drosophila115, 116 and mammals110. Our data support these hypotheses, and we have generalized this idea over many different functional elements. The presence of conserved function encoded by conserved orthologous bases is a commonplace assumption in comparative genomics; our findings indicate that there could be a sizable set of functionally conserved but non-orthologous elements in the human genome, and that these seem unconstrained across mammals. Functional data akin to the ENCODE Project on other related species, such as mouse, would be critical to understanding the rate of such functionally conserved but non-orthologous elements.

After reading the whole paper more closely I feel like there need to be 5 or 6 titles, there's so much stuff packed into that paper.

Related: Keep track of this via google news, it'll be big. John Timmer at Ars Technica is not happy.

Tags

More like this

"From an evolutionary viewpoint it also seemed a bit peculiar to relegate most of the genome to non-functional status, after all, why was it still around after all this time? Evolution is a noisy process that is predicated on "good enough" local solutions, but it seemed a little bit of a stretch to believe that this is the best that various evolutionary dynamics could come up with."

I think you are misunderstanding the paper, or perhaps not expressing yourself clearly. The authors do not say that most of this genomic DNA confers selectable advantages, but only that is not functionally inert ("biochemically inactive"). In other words, the fact that most of genomic DNA is "active" (e.g. transcribed) doesn't mean that it isn't by-and-large selectively neutral baggage carried around for no specific function ("junk"). However, because this DNA is not biochemically inert, it can at times (fairly often?) be co-opted into selectable functions, thus explaining while a meaningful proportion of non-constrained DNA is found to be selectively functional in any given organism. (Echoes of "evolvability"? Ugh.)

Quoting from the paper:
"We believe there is a considerable proportion of neutral biochemically active elements that do not confer a selective advantage or disadvantage to the organism. This neutral pool of sequence elements may turn over during evolutionary time, emerging via certain mutations and disappearing by others. The size of the neutral pool would largely be determined by the rate of emergence and extinction through chance events; low information-content elements, such as transcription factor-binding sites[110] will have larger neutral pools."

That's very interesting, abut by no means spells the death of "junk DNA", unappealing and problematic as the term is.

(Echoes of "evolvability"? Ugh.)

yeah. i wasn't going to go there ;-) thanks for clarifying the prose. part of it was that i wasn't clear, but part of it is that i hadn't had a chance for a deep read.

NIH is the SECOND organization that formally abandoned the misnomer of "Junk DNA" - after the International PostGenetics Society; http://www.postgenetics.org did so as the first; in its "European Inaugural" 12th of October, 2006

Ample background is at http://www.junkdna.com

For those who may wonder "what is it if not junk?" an algorithmic approach is FractoGene, at

http://www.fractogene.com

(The genome is fractal, and thus it governs development of fractal organelles, organs and organisms).

"Classical Genetics" since Bateson, 1905 took 100 years; to make room for "PostGenetics" (Genomics beyond Genes).

The "PostModern" era of Genetics (PostGenetics) may or may not take 100 years. Before its 2nd Birthday (Sept. 1.) nobody knows - and nobody really cares.

pellionisz_at_junkdna.com

P.S. Don't forget to write a proper Obituary for the 35th Anniversary of Dr. Ohno announcing on the 30th of June, 1972 "Junk DNA".

Perhaps it's junk in the "honey, why don't you chuck this junk out?" "Don't touch that I might need it someday!" sense.

As I've suggested on other blogs, the paper by Wyers et al. (2005) - Cryptic Pol II Transcripts Are Degraded by a Nuclear Quality Control Pathway Involving a New Poly(A) Polymerase, Cell121, 725-737 - (and the lot of related papers on quality control and RNA turnover) might cause some to pause and re-think things.

The abstract:
---------------
Since detection of an RNA molecule is the major criterion to define transcriptional activity, the fraction of the genome that is expressed is generally considered to parallel the complexity of the transcriptome. We show here that several supposedly silent intergenic regions in the genome of S. cerevisiae are actually transcribed by RNA polymerase II, suggesting that the expressed fraction of the genome is higher than anticipated. Surprisingly, however, RNAs originating from these regions are rapidly degraded by the combined action of the exosome and a new poly(A) polymerase activity that is defined by the Trf4 protein and one of two RNA binding proteins, Air1p or Air2p. We show that such a polyadenylation-assisted degradation mechanism is also responsible for the degradation of several Pol I and Pol III transcripts. Our data strongly support the existence of a posttranscriptional quality control mechanism limiting inappropriate expression of genetic information.
--------------

Dr. Vougas' claim (systematic discovery of regulatory motifs) is a recent example of most serious scientists rejecting the notion of "Junk" DNA by Ohno (1972). Fact is, "Junk" DNA was deliberately "framed" (Dr. Ohno himself attempting an even more dismissive adjective "trash" two years before, in 1970 - it did not stick). The record e.g. on http://www.junkdna.com is brimming with a huge amount of evidence disrupting the frame "junk". Right after the presentation of Dr. Ohno in 1972 the "discussion" started by calling his argument "suspect". Framing "Genetics" into 1.3% of (human) Genome OFFICIALLY collapsed by 14th of June, 2007. Serious scientists half a Century before Dr. Ohno (Barbara McClintock) had evidence for "regulatory DNA"; with Jacob and Monod winning a Nobel in 1965 for their "Operon" promoter-regulation. The "frame" of "Junk" DNA was way too small and weak - it is surprising that it "lived" 35 years. It cost many-many millions their lives, since "junk DNA diseases" were neglected far too long (http://www.junkdna.com/junkdna_diseases.html). Now is the time to move forward with "Genomics beyond Genes". Please join the PostGenetics Society http://www.postgenetics.org

pellionisz_at_junkdna.com