ENCODE gets a public reaming

I rarely laugh out loud when reading science papers, but sometimes one comes along that triggers the response automatically. Although, in this case, it wasn't so much a belly laugh as an evil chortle, and an occasional grim snicker. Dan Graur and his colleagues have written a rebuttal to the claims of the ENCODE research consortium — the group that claimed to have identified function in 80% of the genome, but actually discovered that a formula of 80% hype gets you the attention of the world press. It was a sad event: a huge amount of work on analyzing the genome by hundreds of labs got sidetracked by a few clueless statements made up front in the primary paper, making it look like they were led by ignoramuses who had no conception of the biology behind their project.

Now Graur and friends haven't just poked a hole in the balloon, they've set it on fire (the humanity!), pissed on the ashes, and dumped them in a cesspit. At times it feels a bit…excessive, you know, but still, they make some very strong arguments. And look, you can read the whole article, On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE, for free — it's open source. So I'll just mention a few of the highlights.

I'd originally criticized it because the ENCODE argument was patently ridiculous. Their claim to have assigned 'function' to 80% (and Ewan Birney even expected it to converge on 100%) of the genome boiled down to this:

The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type.

So if ever a transcription factor ever, in any cell, bound however briefly to a stretch of DNA, they declared it to be functional. That's nonsense. The activity of the cell is biochemical: it's stochastic. Individual proteins will adhere to any isolated stretch of DNA that might have a sequence that matches a binding pocket, but that doesn't necessarily mean that the constellation of enhancers and promoters are present and that the whole weight of the transcriptional machinery will regularly operate there. This is a noisy system.

The Graur paper rips into the ENCODE interpretations on many other grounds, however. Here's the abstract to give you a summary of the violations of logic and evidence that ENCODE made, and also to give you a taste of the snark level in the rest of the paper.

A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

You may be wondering about the curious title of the paper and its reference to immortal televisions. That comes from (1): that function has to be defined in a context, and that the only reasonable context for a gene sequence is to identify its contribution to evolutionary fitness.

The causal role concept of function can lead to bizarre outcomes in the biological sciences. For example, while the selected effect function of the heart can be stated unambiguously to be the pumping of blood, the heart may be assigned many additional causal role functions, such as adding 300 grams to body weight, producing sounds, and preventing the pericardium from deflating onto itself. As a result, most biologists use the selected effect concept of function, following the Dobzhanskyan dictum according to which biological sense can only be derived from evolutionary context.

The ENCODE group could only declare function for a sequence by ignoring all other context than the local and immediate effect of a chemical interaction — it was the work of short-sighted chemists who grind the organism into slime, or worse yet, only see it as a set of bits in a highly reduced form in a computer database.

From an evolutionary viewpoint, a function can be assigned to a DNA sequence if and only if it is possible to destroy it. All functional entities in the universe can be rendered nonfunctional by the ravages of time, entropy, mutation, and what have you. Unless a genomic functionality is actively protected by selection, it will accumulate deleterious mutations and will cease to be functional. The absurd alternative, which unfortunately was adopted by ENCODE, is to assume that no deleterious mutations can ever occur in the regions they have deemed to be functional. Such an assumption is akin to claiming that a television set left on and unattended will still be in working condition after a million years because no natural events, such as rust, erosion, static electricity, and earthquakes can affect it. The convoluted rationale for the decision to discard evolutionary conservation and constraint as the arbiters of functionality put forward by a lead ENCODE author (Stamatoyannopoulos 2012) is groundless and self-serving.

There is a lot of very useful material in the rest of the paper — in particular, if you're not familiar with this stuff, it's a very good primer in elementary genomics. The subtext here is that there are some dunces at ENCODE who need to be sat down and taught the basics of their field. I am not by any means a genomics expert, but I know enough to be embarrassed (and cruelly amused) at the dressing down being given.

One thing in particular leapt out at me is particularly fundamental and insightful, though. A common theme in these kinds of studies is the compromise between sensitivity and selectivity, between false positives and false negatives, between Type II and Type I errors. This isn't just a failure to understand basic biology and biochemistry, but incomprehension about basic statistics.

At this point, we must ask ourselves, what is the aim of ENCODE: Is it to identify every possible functional element at the expense of increasing the number of elements that are falsely identified as functional? Or is it to create a list of functional elements that is as free of false positives as possible. If the former, then sensitivity should be favored over selectivity; if the latter then selectivity should be favored over sensitivity. ENCODE chose to bias its results by excessively favoring sensitivity over specificity. In fact, they could have saved millions of dollars and many thousands of research hours by ignoring selectivity altogether, and proclaiming a priori that 100% of the genome is functional. Not one functional element would have been missed by using this procedure.

This is a huge problem in ENCODE's work. Reading Birney's commentary on the process, you get a clear impression that they regarded it as a triumph every time they got even the slightest hint that a stretch of DNA might be bound by some protein — they were terribly uncritical and grasped at the feeblest straws to rationalize 'function' everywhere they looked. They wanted everything to be functional, and rather than taking the critical scientific view of trying to disprove their own claims, they went wild and accepted every feeble excuse to justify them.

The Intelligent Design creationists get a shout-out — they'll be pleased and claim it confirms the validity of their contributions to real science. Unfortunately for the IDiots, it is not a kind mention, but a flat rejection.

We urge biologists not be afraid of junk DNA. The only people that should be afraid are those claiming that natural processes are insufficient to explain life and that evolutionary theory should be supplemented or supplanted by an intelligent designer (e.g., Dembski 1998; Wells 2004). ENCODE’s take-home message that everything has a function implies purpose, and purpose is the only thing that evolution cannot provide. Needless to say, in light of our investigation of the ENCODE publication, it is safe to state that the news concerning the death of “junk DNA” have been greatly exaggerated.

Another interesting point is the contrast between big science and small science. As a microscopically tiny science guy, getting by on a shoestring budget and undergraduate assistance, I like this summary.

The Editor-in-Chief of Science, Bruce Alberts, has recently expressed concern about the future of “small science,” given that ENCODE-style Big Science grabs the headlines that decision makers so dearly love (Alberts 2012). Actually, the main function of Big Science is to generate massive amounts of reliable and easily accessible data. The road from data to wisdom is quite long and convoluted (Royar 1994). Insight, understanding, and scientific progress are generally achieved by “small science.” The Human Genome Project is a marvelous example of “big science,” as are the Sloan Digital Sky Survey (Abazajian et al. 2009) and the Tree of Life Web Project (Maddison et al. 2007).

Probably the most controversial part of the paper, though, is that the authors conclude that ENCODE fails as a provider of Big Science.

Unfortunately, the ENCODE data are neither easily accessible nor very useful—without ENCODE, researchers would have had to examine 3.5 billion nucleotides in search of function, with ENCODE, they would have to sift through 2.7 billion nucleotides. ENCODE’s biggest scientific sin was not being satisfied with its role as data provider; it assumed the small-science role of interpreter of the data, thereby performing a kind of textual hermeneutics on a 3.5-billion-long DNA text. Unfortunately, ENCODE disregarded the rules of scientific interpretation and adopted a position common to many types of theological hermeneutics, whereby every letter in a text is assumed a priori to have a meaning.

Ouch. Did he just compare ENCODE to theology? Yes, he did. Which also explains why the Intelligent Design creationists are so happy with its bogus conclusions.

More like this

"Actually, the main function of Big Science is to generate massive amounts of reliable and easily accessible data." An uncharitable commenter might even suggest that the main function of Big Science is to generate Big Grants.

As a population geneticist, I winced when I saw that 80% figure being bandied about. I'm glad to see it vigorously rebutted.

By Ralph Haygood (not verified) on 22 Feb 2013 #permalink

I have two questions about the original paper:
1. How did it pass peer review? Is Nature's review process overwhelmed by the authorship? So much for double-blind reviews ...
2. How can this work satisfy the Vancouver Convention? Being involved in the science DOES NOT QUALIFY AS AUTHORSHIP by itself.

Nice to see that both papers are open access, so kudos there.

I think the snark was deserved, and I do like the intellectually sophisticated means in which they 'sink the slipper'.

By Dave Ingram (not verified) on 22 Feb 2013 #permalink

Many scientist are happy to acknowledge the value of the ENCODE project as a prelude to functional genomic studies, in the same way that sequencing of many genomes have been helpful preludes into functional genomic studies. The issue is that the ENCODE project was not a functional genomic study. It was a purely descriptive genomic study. As such, one cannot assign function to the study, until the work moves from being purely descriptive to being experimental.

By Enrique Amaya (not verified) on 23 Feb 2013 #permalink

With this new paper, along with his equally-devastating "entrails of chickens" paper, Grauer seems to be setting up a new sub-genre of scientific publication, the "sneer-reviewed publication."

By rich lawler (not verified) on 24 Feb 2013 #permalink

How can this work satisfy the Vancouver Convention?

The what convention? I don't think I've ever seen such a thing mentioned in instructions to authors. Granted, I've never submitted to Nature or Science, and the one time I submitted to PNAS was half a year ago...

his equally-devastating “entrails of chickens” paper

I love that one. :-)

the “sneer-reviewed publication.”

*clenched-tentacle salute*

By David Marjanović (not verified) on 25 Feb 2013 #permalink