We need a sociologist of science…or a philosopher

There's another paper out debunking the ENCODE consortium's absurd interpretation of their data. ENCODE, you may recall, published a rather controversial paper in which they claimed to have found that 80% of the human genome was 'functional' — for an extraordinarily loose definition of function — and further revealed that several of the project leaders were working with the peculiar assumption that 100% must be functional. It was a godawful mess, and compromised the value of a huge investment in big science.

Now W. Ford Doolittle has joined the ranks of many scientists who immediately leapt into the argument. He has published "Is junk DNA bunk? A critique of ENCODE" in PNAS.

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE’s ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE’s definition of functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propositions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed.

In the paper, he makes an argument similar to one T. Ryan Gregory has made many times before. There are organisms that have much larger genomes than humans; lungfish, for example, have 130 billion base pairs, compared to the 3 billion humans have. If the ENCODE consortium had studied lungfish instead, would they still be arguing that the organism had function for 104 billion bases (80% of 130 billion)? Or would they be suggesting that yes, lungfish were full of junk DNA?

If they claim that lungfish that lungfish have 44 times as much functional sequence as we do, well, what is it doing? Does that imply that lungfish are far more phenotypically complex than we are? And if they grant that junk DNA exists in great abundance in some species, just not in ours, does that imply that we're somehow sitting in the perfect sweet spot of genetic optimality? If that's the case, what about species like fugu, that have genomes one eighth the size of ours?

It's really a devastating argument, but then, all of the arguments against ENCODE's interpretations have been solid and knock the whole thing out of the park. It's been solidly demonstrated that the conclusions of the ENCODE program were shit.

yalejunk

So why, Yale, why? The Winter edition of the Yale Medicine magazine features as a cover article Junk No More, an awful piece of PR fluff that announces in the first line "R.I.P., junk DNA" and goes on to tout the same nonsense that every paper published since the ENCODE announcement has refuted.

The consortium found biological activity in 80 percent of the genome and identified about 4 million sites that play a role in regulating genes. Some noncoding sections, as had long been known, regulate genes. Some noncoding regions bind regulatory proteins, while others code for strands of RNA that regulate gene expression. Yale scientists, who played a key role in this project, also found “fossils,” genes that date to our nonhuman ancestors and may still have a function. Mark B. Gerstein, Ph.D., the Albert L. Williams Professor of Biomedical Informatics and professor of molecular biophysics and biochemistry, and computer science, led a team that unraveled the network of connections between coding and noncoding sections of the genome.

Arguably the project’s greatest achievement is the repository of new information that will give scientists a stronger grasp of human biology and disease, and pave the way for novel medical treatments. Once verified for accuracy, the data sets generated by the project are posted on the Internet, available to anyone. Even before the project’s September announcement, more than 150 scientists not connected to ENCODE had used its data in their research.

“We’ve come a long way,” said Ewan Birney, Ph.D., of the European Bioinformatics Institute (EBI) in the United Kingdom, lead analysis coordinator for ENCODE. “By carefully piecing together a simply staggering variety of data, we’ve shown that the human genome is simply alive with switches, turning our genes on and off and controlling when and where proteins are produced. ENCODE has taken our knowledge of the genome to the next level, and all of that knowledge is being shared openly.”

Oh, Christ. Not only is it claiming that the 80% figure is for biological activity (it isn't), but it trots out the usual university press relations crap about how the study is all about medicine. It wasn't and isn't. It's just that dumbasses can only think of one way to explain biological research to the public, and that is to suggest that it will cure cancer.

As for Birney's remarks, they are offensively ignorant. No, the ENCODE research did not show that the human genome is actively regulated. We've known that for fifty years.

That's not the only ahistorical part of the article. They also claim that the idea of junk DNA has been discredited for years.

Some early press coverage credited ENCODE with discovering that so-called junk DNA has a function, but that was old news. The term had been floating around since the 1990s and suggested that the bulk of noncoding DNA serves no purpose; however, articles in scholarly journals had reported for decades that DNA in these “junk” regions does play a regulatory role. In a 2007 issue of Genome Research, Gerstein had suggested that the ENCODE project might prompt a new definition of what a gene is, based on “the discrepancy between our previous protein-centric view of the gene and one that is revealed by the extensive transcriptional activity of the genome.” Researchers had known for some time that the noncoding regions are alive with activity. ENCODE demonstrated just how much action there is and defined what is happening in 80 percent of the genome. That is not to say that 80 percent was found to have a regulatory function, only that some biochemical activity is going on. The space between genes was also found to contain sites where DNA transcription into RNA begins and areas that encode RNA transcripts that might have regulatory roles even though they are not translated into proteins.

I swear, I'm reading this article and finding it indistinguishable from the kind of bad science I'd see from ICR or Answers in Genesis.

I have to mention one other revelation from the article. There has been a tendency to throw a lot of the blame for the inane 80% number on Ewan Birney alone…he threw in that interpretation in the lead paper, but it wasn't endorsed by every participant in the project. But look at this:

The day in September that the news embargo on the ENCODE project’s findings was lifted, Gerstein saw an article about the project in The New York Times on his smartphone. There was a problem. A graphic hadn’t been reproduced accurately. “I was just so panicked,” he recalled. “I was literally walking around Sterling Hall of Medicine between meetings talking with The Times on the phone.” He finally reached a graphics editor who fixed it.

So Gerstein was so concerned about accuracy that he panicked over an article in the popular press, but had no problem with the big claim in the Birney paper, the one that would utterly undermine confidence in the whole body of work, did not perturb him? And now months later, he's collaborating with the Yale PR department on a puff piece that blithely sails past all the objections people have raised? Remarkable.

This is what boggles my mind, and why I hope some sociologist of science is studying this whole process right now. It's a revealing peek at the politics and culture of science. We have a body of very well funded, high ranking scientists working at prestigious institutions who are actively and obviously fitting the data to a set of unworkable theoretical presuppositions, and completely ignoring the rebuttals that are appearing at a rapid clip. The idea that the entirety of the genome is both functional and adaptive is untenable and unsupportable; we instead have hundreds of scientists who have been bamboozled into treating noise as evidence of function. It's looking like N rays or polywater on a large and extremely richly budgeted level. And it's going on right now.

If we can't have a sociologist making an academic study of it all, can we at least have a science journalist writing a book about it? This stuff is fascinating.

I have my own explanation for what is going on. What I think we're seeing is an emerging clash between scientists and technicians. I've seen a lot of biomedical grad students going through training in pushing buttons and running gels and sucking numerical data out of machines, and we've got the tools to generate so much data right now that we need people who can manage that. But it's not science. It's technology. There's a difference.

A scientist has to be able to think about the data they're generating, put it into a larger context, and ask the kinds of questions that probe deeper than a superficial analysis can deliver. A scientist has to be more broadly trained than the person who runs the gadgetry.

This might get me burned at the stake worse than sneering at ENCODE, but a good scientist has to be…a philosopher. They may not have formal training in philosophy, but the good ones have to be at least roughly intuitive natural philosophers (ooh, I've heard that phrase somewhere before). If I were designing a biology curriculum today, I'd want to make at least some basic introduction to the philosophy of science an essential and early part of the training.

I know, I'm going against the grain — there have been a lot of big name scientists who openly dismiss philosophy. Richard Feynman, for instance, said "Philosophy of science is about as useful to scientists as ornithology is to birds." But Feynman was wrong, and ironically so. Reading Feynman is actually like reading philosophy — a strange kind of philosophy that squirms and wiggles trying to avoid the hated label, but it's still philosophy.

I think the conflict arises because, like everything, 90% of philosophy is garbage, and scientists don't want to be associated with a lot of the masturbatory nonsense some philosophers pump out. But let's not lose sight of the fact that some science, like ENCODE, is nonsense, too — and the quantity of garbage is only going to rise if we don't pay attention to understanding as much as we do accumulating data. We need the input of philosophy.

Categories

More like this

Hear, hear. The specific area of philosophy of science that seems most relevant to this issue is the literature surrounding "underdetermination": the idea that any observation can support or undermine any hypothesis, if you mess with the auxiliary assumptions enough. Most philosophers of science reject the idea of wholesale underdetermination, but there's a lot more to be said about where, exactly, it goes wrong.

I've often wondered what exactly Feynman's remark is really supposed to mean. As others have pointed out, ornithology does seem pretty useful to birds, for example.

Great stuff!

"I hope some sociologist of science is studying this whole process right now. It’s a revealing peek at the politics and culture of science."
Ahh - you want us to help you? But then, if we actually do say that science is indeed a social system full of shit going on, you are going to tell us that we are anti-science, right? Look, if you want sociologists of science to survive so that they can help you once in a while, perhaps it is time to not shit on their point of view too much and support them once in a while, too. You can start by taking the article "The Missing Memristor: Novel Nanotechnology or rather new Case Study for the Philosophy and Sociology of Science?", the last part of which (the one on sociological aspects) is unpublishable (and even the first part was rejected 10 times, because of the sociology of science), and if you help me get it published (we can combine it with the analysis of ENCODE), I may be interested in your specific case, because you see, if one is not allowed to publish and gets also otherwise no support, well, how is one going to have food on the table to do such work? I know, I know, you think your stuff is so much more important than everything else, but let me tell you what I got to hear as one of the reasons for rejection: "If you think that the sociology of science is in need of more case studies, think again!"

By Sascha Vongehr (not verified) on 23 Mar 2013 #permalink

Interesting articles on information like this is a great find. It

By michael kors (not verified) on 24 Mar 2013 #permalink

It has recently come to my attention that 100% of human DNA is evolving into "junk". It is pretty clear that mankind doesn't deserve a place in the universe and hopefully the tapeworm will rise up and take his place.

By Tal Finkle (not verified) on 24 Mar 2013 #permalink

I have to say that bioinformatics is plagued by technicians with little if any knowledge of biology. If you spend time on any of the bioinfomatics forums, you'd be amazed by the basic questions that come up constantly, such as whether DNA sequences are 5' to 3' or the other way around, what exactly codons are, and how splicing affects things.

I have always been confused by the creationists' objection to the concept of junk DNA. I would think that this would fit in perfectly with their dogma of mankind's fall from grace, inherent imperfection, and continued degradation - a sure sign of god's wrath against all life for Adam's disobedience. Shouldn't they see the presence and dominance of junk DNA as a serious challenge to Darwinian evolution due to its anti-fitness waste of resources? Evolutionists and creationists seem to have switched their most obvious rationals in this debate.

By AnswersInGenitals (not verified) on 24 Mar 2013 #permalink

Just send me the money, and I'll do it.

By John S. Wilkins (not verified) on 24 Mar 2013 #permalink

Shouldn’t they see the presence and dominance of junk DNA as a serious challenge to Darwinian evolution due to its anti-fitness waste of resources?

Indeed this was a very common objection by biologists when the idea of junk DNA was being introduced: if there's all this junk that's useless at best, why hasn't natural selection removed it?

Turns out it can't: there's no enzyme that can recognize junk DNA and cut it out, and there's no way one could evolve, so we have to wait for random deletions. Those are rare enough that the rate of junk DNA loss is similar to the rate of junk DNA gain.

By David Marjanović (not verified) on 24 Mar 2013 #permalink

I can identify biological activity in every single part of the genome: IT'S CALLED MITOSIS.

Send me a couple of hundred million dollars, please.

Nnnnnnn, so stupid (not the assay itself, which might be useful, but the PR-driven / flamebait conclusions).

"Send me a couple of hundred million dollars, please."

Like anyone will.

By Arctodus23 (not verified) on 25 Mar 2013 #permalink

Is there any solid reason to assume a one-to-one mapping between phenotypic complexity and number of functional genes? This seems to be an oversimplistic understanding of complexity.

Further, is there even a definition of phenotypic complexity? Does the term "functional" rule out the possibility of redundancy? A gene or DNA element can be both functional and redundant. It is entirely conceivable that behavioral flexibility allows for a less redundant genome.

In other words, one can imagine than phenotypically "simple" organisms rely on genomic complexity/redundancy to deal with environmental challenges, whereas phenotypically "complex" organisms rely on behaviorally flexibility to deal with challenges.

Is there any solid reason to assume a one-to-one mapping between phenotypic complexity and number of functional genes?

No, why? I don't think anyone has claimed such a mapping for 30 or 40 years now.

is there even a definition of phenotypic complexity?

Nope. People tried for decades and never came up with anything useful.

By David Marjanović (not verified) on 25 Mar 2013 #permalink

You say "90% of philosophy is garbage". Can you provide references?

By Lynn Dewees (not verified) on 30 Mar 2013 #permalink