Liveblogging a PLoS ONE Article

PLoS ONE has recently published a paper entitled "Beyond the Gene" by Evelyn Fox Keller and David Harel, in which the authors take a stab at the long standing question: What is a Gene? Because this is such a big picture question, the appropriate discussion of the paper would involve a synthesis of what they authors wrote, what has previously been written, and what I think about all of that. I'm not going to do that. I'm too lazy and too stupid to do that. Instead, I'm going to read the paper and live-blog it. This one's for you, Bora.

Quick tip: If you're planning to read an entire paper, skip the abstract. At best, it will be a waste of time. At worst, it will plant ideas in your mind that will follow you throughout the rest of the paper like annoying little preconceived notions. I'm not reading the abstract.

Fox Keller and Harel start with a quote from Lavoisier, and the entire introduction is peppered with more quotes. Here's the next one:

"Where the meaning of most four-letter words is all too clear, that of gene is not. The more expert scientists become in molecular genetics, the less easy it is to be sure about what, if anything, a gene actually is."

Helen Pearson

You've got to be shitting me! What a load of shit! That's fucking bullshit! The meaning of most four-letter words is not fucking clear. It's a fucking mess. And anyone who says otherwise can go fuck themselves because they're a mother-fucking cock-sucker. It's downright cocky to claim otherwise.

Apparently bioinformaticians only care about definitions of genes if they include something about the location of the gene in the genome. More conventional experimental geneticists (Fox Keller & Harel's words, not mine), on the other hand, recognize that inheritance can be non-genetic (i.e., via proteins). Also, those experimental geneticists commit logical fallacies:

Lindquist challenges the material basis of heredity, but perhaps even more than the bioinformatics people, she remains happy with the notion of "genetic elements", of heritable entities. This begs the question of what is a genetic element.

It does not beg the question. It may raise the question, but it does no begging.

I present item number one that the authors do not understand evolution:

Today we know that much more than DNA sequence is passed on from one generation to another, but even restricting ourselves to this DNA, which many would argue is the only truly tangible part of an organism that is actually passed down, we read that, at least for higher organisms, a mere 1-2% of the genome is spanned by protein-coding sequences.

So, organisms that smoke more weed have genomes made up of a mere 1-2% protein-coding sequences? Don't tell me we're gonna get into a discussion about genome size, because that's what really affects the fraction of the genome that's protein coding. Small genomes have a higher proportion of protein-coding sequence than large genomes. That's what's behind that 1-2% in humans. In Drosophila, it's closer to 10-20% percent. Sure, there are other functional regions, but the amount of protein-coding sequence is a pretty good estimate of the fraction of the genome that's functional.

Helen Pearson, the person with the quote about fucking shitty cocks, brings some context to this whole debate:

As Pearson observes, most geneticists are not trying to find a definition of the gene on which they can agree. Instead, they tend to use "less ambiguous words such as transcripts and exons. [And even] when it is used, the word 'gene' is frequently preceded by 'protein coding' or another descriptor..... Some things," she concludes, "are not best portrayed by a crude four-letter word."

That pretty much sums it up. We can stop now. Anyone worth an ounce of the weed that got some organisms high realizes that, for the sake of clarity, you should qualify the word "gene" whenever you use it. There's really no reason to continue with the article. That said, I'll keep reading for the sake of completeness. And for fuck's sake.

Here's how the authors justify going on:

Kapranov et al write, "These observations suggest that genomic architecture is not colinear, but is instead interleaved and modular, and that the same genomic sequences are multifunctional". Their argument has now been strikingly confirmed by the first results of the ENCODE Project Consortium. These findings are of immense interest, and dramatically underscore the need for new ways of thinking about DNA sequences, genomic organization, and their relationship to function. If we have learned nothing else, it is that that relation is far more complex than we had ever anticipated. Yet more recently, Gingeras suggests that this increased complexity necessitates "a reconsideration of the definition of a gene and require[s] the use of an alternative term to help to define the fundamental operational unit that relates genomic sequences to phenotypes/function."

No, what Pearson said is pretty much correct. We can keep using the term "gene" so long as we qualify it with what we mean. If we're talking about something that encodes a protein, then we call it a "protein-coding gene". If the terminal product is an RNA, then it's an "RNA-gene", which can be made more specific by what type of RNA it encodes (rRNA, tRNA, miRNA, siRNA, snRNA, qwertyRNA . . . okay, I made that last one up). And so on.

But Fox Keller and Harel aren't down with my jive. They want to do away with the gene. Instead of the gene, they want this:

Accordingly, the first thing we'd like to do is to offer as a replacement for the gene a concept that is closely related, even if of a different kind, which we shall call the dene. Like the gene, our notion of dene is intended to capture the essence of genetic transmission, but, rather than being confined to denoting a discrete chunk of DNA, it is far richer and more expressive. A dene is, in fact, a general kind of statement about the DNA -- what logicians call a predicate or a property. Denes can be used to represent vastly more intricate characteristics of the DNA sequence than the simple statement that it contains a particular subsequence.

What a load of post-modernist bull-shit. Is this the molecular biology Sokal paper? It sure reads like it. Fuck you Dawkins for inventing the meme, and fuck you Fox Keller and Harel for inventing the dene. Thankfully, this paper won't make much of an impact, and no one will be using your silly jargon (not true for Dawkins's meme). Instead, we're gonna stick to the well established, and perfectly acceptable, terminology that Pearson described.

But they're not done. Here's the bene:

For symmetry, we will refer to a statement about behavior as a bene. As with denes, our notion of bene will also be extremely rich, making it possible to express complex modal and temporal characteristics of the organism's behavior over time, characteristics that go far beyond simple statements about, e.g., protein synthesis or transcription.

And the genitor (which is not a character from Transformers):

In fact, it makes little sense to specify a dene -- i.e., to make statements about a DNA sequence -- without identifying the behaviors with which that sequence has come to be associated. Accordingly, we propose to be explicit about this conjunction, and thus introduce as our main concept, the genetic functor, or the genitor. This we define as the logical relation that says: Whenever the organism is seen to have X, it does Y. Or to use our new terms, a genitor relates a particular dene to a particular bene, stating that whenever the organism's DNA is seen to satisfy the property expressed by the dene, it's behavior satisfies the property expressed by the bene.

Too much jargon! Make it stop! But, no, we're just completing the Introduction, and we've still got the "Analysis" and "Discussion" sections to go. This whole paper seems like an introduction to the analysis of a discussion -- is it really necessary to break it into three parts?

Here's how they unite all the jargon they've created:

Syntactically, a genetic functor, or genitor, G is defined as a triple G = (O, D, B), which groups together the organism O with a dene D and a bene B. The former is a statement about O's DNA and the latter is a statement about O's behavior. Both of these will be described in more detail below, but for now it suffices to say that, semantically, the dene D is a truth-valued function of O's DNA sequence and the bene B is a truth valued function of O's temporal life-span.

For something that's intended to "enable current and future research to move forward more effectively", it sure doesn't make anything clearer. I'd describe it as elegant, yet convoluted. Actually, it's not really very elegant. More like convoluted, yet contrived. That pretty much sums it up.

This made me laugh:

It will be noticed that between genes and denes lies a difference of only one letter, yet, we argue, the latter, as we use it, designates a concept on an altogether different logical level.

I'm not sure that it was supposed to.

They introduce another letter: "S to denote the organism O's complete DNA sequence". Are we having fun yet? You do realize that "This is one of the manifestations of rich logical expressiveness." You don't? Neither do I.

Here are some more terms that will be incorporated into the model:

  • E - environment
  • M - internal mechanisms
  • F, G - events

We're running out of letters.

Next, the examples. Here's a simple one:

The classical polypeptide coding unit found as a continuous stretch of DNA bounded by a stop and start codon (corresponding to Seymour Benzer's cistron) is easily captured by a dene. The corresponding bene would be the production of a polypeptide chain (including transcription and translation), and the genitor would then capture the link between the two.

And here's where they appear to not understand molecular genetics:

Alternative splicing: Here a dene refers to any set of mRNA transcripts sewn together to form a protein-coding unit. In general, the dene would not require these to be contiguous. Each such dene would be associated with the corresponding polypeptide, the genitor capturing the relation that specifies the components of the dene corresponding to the bene that specifies production of the polypeptide.

Alternative splicing isn't a set of transcripts sewn together. It's a single transcript with various parts removed. Depending on which parts are removed, different processed RNAs are produced. Those different RNAs may encode different proteins (I say "may" because the non-protein-coding exons may be alternatively spliced).

They also bring up different types of RNA (rRNA, tRNA, micro-RNA, siRNA, snoRNA, sbRNA, snlRNA) and claim, "In our terminology, defining the appropriate genitors that would relate such benes to particular denes in a clean and rigorous fashion is the crux of the problem." Why not just use the names that have been given to them? I still don't see what their point is.

But here's one that really irks me:

Other segments of DNA without either an RNA or protein product are associated with yet different kinds of behavior. E.g., parts of the DNA, or various combinations thereof, which have a direct influence on mutation rates, would be definable as denes. A particularly interesting example of the latter is provided by stretches of small sequence repeats (SSR) that can induce slippage in the processes of replication, transcription, and even translation, and are accordingly sources of localized hyper-mutability (Moxon et al, 2006). Indeed, it has been argued that simple sequence repeats may equip the cell "with adjustable 'tuning knobs' for efficient adaptation" (King et al 1997).

What a load of adaptationist claptrap. These hyper-mutable repeats exist because they're not deleterious enough to be purged by purifying selection. It's very rare to find an coherent adaptationist explanation for tandem repeats.

As if we haven't had enough already, Fox Keller and Harel tell us that they need to formalize their language. They claim:

In the good spirit of computational and systems biology, it is obvious that mathematical and algorithmic formalization of biological concepts has the advantage of -- indeed is done for the purpose of -- enabling computerized analysis.

Only they're not going to do that. And if they don't do it, I don't think anyone will. Sure, they'll explain how other people should do it, but they won't do it themselves. So, we'll move on to the Discussion.

The discussion starts out with some talk of modularity and phenotypic plasticity. This kind of stuff gives developmental biologists a hard-on. It is accompanied by more adaptationism:

Indeed, evolution selects for such adaptability (and hence, for modularity), because the ability to adapt that modularity confers on the organism enhances survival.

This is not a unanimous view, and not one that can be presented sans citation. Modularity can evolve neutrally (see here for a review). The authors are full of shit (this time the four letter word means exactly what you think it means). The requirement: small population size. That's an assumption humans meet, with our complexified modular genome.

Fox Keller and Harel offer a closing salvo about how traits can be inherited via mechanisms other than DNA (i.e., RNA, proteins, etc.). This supposedly relates to their denes, benes, and genitors. Somehow. Whatever. I read this crap so you don't have to (all apologies to Lenny Bruce). And if you haven't, don't. It's not worth your time. It's vacuous navel gazing. Content free scholarship. Nothing for no one's sake. Okay, maybe it's for shit's sake.


Fox Keller E, Harel D (2007) Beyond the Gene. PLoS ONE 2(11): e1231. doi:10.1371/journal.pone.0001231

More like this

I have to write a 3,000 dissertation on "How have the results of the human and chimpanzee genome projects revealed what is special about Homo sapiens."

Are there tips you can give me on certain features I should research. Any book recommendations you think I should pick up.

Please give me an e-mail at steven_wgi@hotmail.com or post on a reply here and I will see it.

Thanks

Kind Regards

Steven

This was fun, thank you for reading it so I don't have to. I would like to see more people avoiding writing about the organisms smoking weed. My respect for Molecular Biology of the Cell sank a lot when they starting writing so much about weed smoking organisms.

Steven,

Short answer: nothing. Try to stretch that out to 3000 words.

As a computer scientist with only a smattering of biology, I can't respond to the mistakes they made regarding evolution but I think you are overly harsh on the important matters of the paper.

What a load of post-modernist bull-shit. Is this the molecular biology Sokal paper?

This paper was wrote by a physicist/philosopher whose "research focuses on the history and philosophy of modern biology" and a computer scientist whose research includes "modeling and analysis of biological systems." It wasn't wrote by a biologist so I think you're being overly harsh.

Here are some more terms that will be incorporated into the model:
* E - environment
* M - internal mechanisms
* F, G - events

Here you are completely wrong. The paper does not say that E and M will be incorporated instead:
"it might be beneficial at a later stage to make the environment (E) and internal mechanisms (M) explicit."

You also completely misunderstood F and G. They are only event labels in one example, so your remark is similar to saying that D-endo16promoter is part of their model.

Their entire model is G = (O, D, B) where G is the genitor, O is the organism, D is the dene, and B is the bene. If this confuses you, theoretical computer science would really warp your mind.

In computer science, we can define the NFA model as M = (Q, Σ, T, q0, F) where Q is the set of states, Σ is the alphabet, T is a transition function, q0 is the initial state, and F is the final state. Would your response to this simple model be "too much jargon!" and "we're running out of letters"?

Do biologists (disregarding computational biologists) ever use formal languages? If not, I think they might have overestimated their audience.

It's not worth your time. It's vacuous navel gazing. Content free scholarship. Nothing for no one's sake. Okay, maybe it's for shit's sake.

You might not like the conceptual model they suggest but I think there are benefits to developing a finite-state model. If readers aren't interested in building biological models, then I guess it would appear meaningless.

shit like this along with the general poor writing and errors in most of their papers is why i dislike PLOS

This paper opens up a whole new world of possible jargons. We need to get busy coming up with definitions for the:
pene
wene
schmene
clene
jene
kene
chene
ghrene

etc.

By Sven DiMilo (not verified) on 29 Nov 2007 #permalink

That was hilarious. I totally agree. Thanks for making me laugh out loud. Everyone who studies biology can easily communicate what they mean using the word gene, with the necessary adjective in from of it. I have a feeling that they are more interested in having fun with words (not that I don't enjoy that) than actually assisting in science communication. Why are they reinventing the wheel, when the wheel is already heavily chromed with nice shinny rims? RIP dene, bene, genitor.

Now I am waiting for their next paper in which the denome and benome are defined and the fields of denomics and benomics are proclaimed. Hopefully we'll soon haveThe Journal of Denomics and Benomics. You'll find my other 2 cents here.

I have a feeling that they are more interested in having fun with words (not that I don't enjoy that) than actually assisting in science communication.

Did anyone actually read the whole paper?

They do more than suggest a new vocabulary. They are suggesting a conceptual framework that lends itself well to formalization. Since I'm not a biologist, I don't know if the change in conceptual framework is needed but I do see the need for a finite-state model.

Sure, they don't give a formal language (or an equivalent finite state model) but since they are not biologists they would probably need help developing a language that would (or could?) be used by the majority of relevant biologists.

This article (and its responses) make me think that contempt is not limited to physics.

First off, I agree that the formalization that they present may assist in computational analysis and modeling of the interaction of genes in a network. However, I am curious whether computational biologists already use a similar formalization when modeling gene networks. I did not see much reference to the already vast amount of literature that does model gene networks as well as , critically, the evolution of gene networks.

But the creation of a formalization for the modeling of genetic pieces of material and the interaction of those pieces of genetic and epigenetic material (without silly visual cartoons) is not the premise of their paper. Instead, they make it sound like there is widespread disarray in the field of biology as a result of no one really knowing what a gene is. While some folks stuck in the old-school operon mindset may have trouble dealing with the current knowledge of the complexity of genetic material, epigenetic modifications, and small RNAs, I believe that a vast majority of Biologists have a pretty good feeling of what a gene is. Its a piece of DNA that does something... with a few exceptions of course, but heck, that's Biology.

Further, I am worried that the formal model that they describe might not easily be able to tolerate natural genetic variation. They make the organism part of there genitor set sound so simple, but there is a ton of natural genetic variation in the relative action of genes among populations as well as tons of segregation variation for duplications and losses of genes. Please, correct me if I am wrong. While their language allows for dynamics within the organism, I wonder how useful it would be for population genetic modeling.

Finally, each of the model organisms has a really nice formal way of describing genes and other pieces of DNA already, that does exactly what they describe in their paper. For example: AT1G13870 is a gene in the organism Arabidopsis thaliana (AT) that is located on the first chromosome (1) in the genomic DNA (G) at position 13870. If you look up that gene on the Arabidopsis website, it says that it's name is DEFORMED ROOTS AND LEAVES 1, which is exactly what the gene does when knocked out. How, then I ask is their formalization any different from that?

I am curious whether computational biologists already use a similar formalization when modeling gene networks. I did not see much reference to the already vast amount of literature that does model gene networks as well as , critically, the evolution of gene networks.

stop wondering :)

or on evolution of networks

this looks like a classic example of thinking you've come up with something really clever in some field outside your own, without bothering to read the relevant literature.

"Its a piece of DNA that does something..." But it it? Or is a unit of heredity which we have to mentally detach from the molecule of DNA? Equating genes with DNA is the problem that this paper tries to resolve. Many geneticists, when confronted in conversation, agree that DNA is not the only "stuff" of heredity, yet they then go back to the lab and do their work as if DNA IS the only stuff of heredity. This mental crutch of thinking about genetics ONLY in terms of DNA is the problem. Until we resolve it, we cannot move on.

I'll take that on board. Cheers.

'Many geneticists, when confronted in conversation, agree that DNA is not the only "stuff" of heredity, yet they then go back to the lab and do their work as if DNA IS the only stuff of heredity.'

Yes, you are absolutely correct that DNA is not the only mechanism of heredity. However, when you look at QTL mapping studies, a vast portion of the genetic variation in a give trait is consistently explained by the DNA genome. Thus, genetic variation is mostly explained by pieces of DNA that do things. This variation may not all be base pair changes as it could be the result of epigenetic modifications to DNA (histone modifications, CpG methylation, etc.), but that would still be a modification to a piece of DNA that does something. This may be a simple minded way of thinking about things, but it has served us quite well. Further, I really don't believe that the lack of a definition of the word "gene" will slow us down from determining what underlies the small portion of heritable genetic variation not related to DNA.

In computer science, we can define the NFA model as M = (Q, Σ, T, q0, F) where Q is the set of states, Σ is the alphabet, T is a transition function, q0 is the initial state, and F is the final state. Would your response to this simple model be "too much jargon!" and "we're running out of letters"? Do biologists (disregarding computational biologists) ever use formal languages? If not, I think they might have overestimated their audience.