PLoS ONE has recently published a paper entitled “Beyond the Gene” by Evelyn Fox Keller and David Harel, in which the authors take a stab at the long standing question: What is a Gene? Because this is such a big picture question, the appropriate discussion of the paper would involve a synthesis of what they authors wrote, what has previously been written, and what I think about all of that. I’m not going to do that. I’m too lazy and too stupid to do that. Instead, I’m going to read the paper and live-blog it. This one’s for you, Bora.
Quick tip: If you’re planning to read an entire paper, skip the abstract. At best, it will be a waste of time. At worst, it will plant ideas in your mind that will follow you throughout the rest of the paper like annoying little preconceived notions. I’m not reading the abstract.
Fox Keller and Harel start with a quote from Lavoisier, and the entire introduction is peppered with more quotes. Here’s the next one:
“Where the meaning of most four-letter words is all too clear, that of gene is not. The more expert scientists become in molecular genetics, the less easy it is to be sure about what, if anything, a gene actually is.”
You’ve got to be shitting me! What a load of shit! That’s fucking bullshit! The meaning of most four-letter words is not fucking clear. It’s a fucking mess. And anyone who says otherwise can go fuck themselves because they’re a mother-fucking cock-sucker. It’s downright cocky to claim otherwise.
Apparently bioinformaticians only care about definitions of genes if they include something about the location of the gene in the genome. More conventional experimental geneticists (Fox Keller & Harel’s words, not mine), on the other hand, recognize that inheritance can be non-genetic (i.e., via proteins). Also, those experimental geneticists commit logical fallacies:
Lindquist challenges the material basis of heredity, but perhaps even more than the bioinformatics people, she remains happy with the notion of “genetic elements”, of heritable entities. This begs the question of what is a genetic element.
It does not beg the question. It may raise the question, but it does no begging.
I present item number one that the authors do not understand evolution:
Today we know that much more than DNA sequence is passed on from one generation to another, but even restricting ourselves to this DNA, which many would argue is the only truly tangible part of an organism that is actually passed down, we read that, at least for higher organisms, a mere 1-2% of the genome is spanned by protein-coding sequences.
So, organisms that smoke more weed have genomes made up of a mere 1-2% protein-coding sequences? Don’t tell me we’re gonna get into a discussion about genome size, because that’s what really affects the fraction of the genome that’s protein coding. Small genomes have a higher proportion of protein-coding sequence than large genomes. That’s what’s behind that 1-2% in humans. In Drosophila, it’s closer to 10-20% percent. Sure, there are other functional regions, but the amount of protein-coding sequence is a pretty good estimate of the fraction of the genome that’s functional.
Helen Pearson, the person with the quote about fucking shitty cocks, brings some context to this whole debate:
As Pearson observes, most geneticists are not trying to find a definition of the gene on which they can agree. Instead, they tend to use “less ambiguous words such as transcripts and exons. [And even] when it is used, the word ‘gene’ is frequently preceded by ‘protein coding’ or another descriptor….. Some things,” she concludes, “are not best portrayed by a crude four-letter word.”
That pretty much sums it up. We can stop now. Anyone worth an ounce of the weed that got some organisms high realizes that, for the sake of clarity, you should qualify the word “gene” whenever you use it. There’s really no reason to continue with the article. That said, I’ll keep reading for the sake of completeness. And for fuck’s sake.
Here’s how the authors justify going on:
Kapranov et al write, “These observations suggest that genomic architecture is not colinear, but is instead interleaved and modular, and that the same genomic sequences are multifunctional”. Their argument has now been strikingly confirmed by the first results of the ENCODE Project Consortium. These findings are of immense interest, and dramatically underscore the need for new ways of thinking about DNA sequences, genomic organization, and their relationship to function. If we have learned nothing else, it is that that relation is far more complex than we had ever anticipated. Yet more recently, Gingeras suggests that this increased complexity necessitates “a reconsideration of the definition of a gene and require[s] the use of an alternative term to help to define the fundamental operational unit that relates genomic sequences to phenotypes/function.”
No, what Pearson said is pretty much correct. We can keep using the term “gene” so long as we qualify it with what we mean. If we’re talking about something that encodes a protein, then we call it a “protein-coding gene”. If the terminal product is an RNA, then it’s an “RNA-gene”, which can be made more specific by what type of RNA it encodes (rRNA, tRNA, miRNA, siRNA, snRNA, qwertyRNA . . . okay, I made that last one up). And so on.
But Fox Keller and Harel aren’t down with my jive. They want to do away with the gene. Instead of the gene, they want this:
Accordingly, the first thing we’d like to do is to offer as a replacement for the gene a concept that is closely related, even if of a different kind, which we shall call the dene. Like the gene, our notion of dene is intended to capture the essence of genetic transmission, but, rather than being confined to denoting a discrete chunk of DNA, it is far richer and more expressive. A dene is, in fact, a general kind of statement about the DNA — what logicians call a predicate or a property. Denes can be used to represent vastly more intricate characteristics of the DNA sequence than the simple statement that it contains a particular subsequence.
What a load of post-modernist bull-shit. Is this the molecular biology Sokal paper? It sure reads like it. Fuck you Dawkins for inventing the meme, and fuck you Fox Keller and Harel for inventing the dene. Thankfully, this paper won’t make much of an impact, and no one will be using your silly jargon (not true for Dawkins’s meme). Instead, we’re gonna stick to the well established, and perfectly acceptable, terminology that Pearson described.
But they’re not done. Here’s the bene:
For symmetry, we will refer to a statement about behavior as a bene. As with denes, our notion of bene will also be extremely rich, making it possible to express complex modal and temporal characteristics of the organism’s behavior over time, characteristics that go far beyond simple statements about, e.g., protein synthesis or transcription.
And the genitor (which is not a character from Transformers):
In fact, it makes little sense to specify a dene — i.e., to make statements about a DNA sequence — without identifying the behaviors with which that sequence has come to be associated. Accordingly, we propose to be explicit about this conjunction, and thus introduce as our main concept, the genetic functor, or the genitor. This we define as the logical relation that says: Whenever the organism is seen to have X, it does Y. Or to use our new terms, a genitor relates a particular dene to a particular bene, stating that whenever the organism’s DNA is seen to satisfy the property expressed by the dene, it’s behavior satisfies the property expressed by the bene.
Too much jargon! Make it stop! But, no, we’re just completing the Introduction, and we’ve still got the “Analysis” and “Discussion” sections to go. This whole paper seems like an introduction to the analysis of a discussion — is it really necessary to break it into three parts?
Here’s how they unite all the jargon they’ve created:
Syntactically, a genetic functor, or genitor, G is defined as a triple G = (O, D, B), which groups together the organism O with a dene D and a bene B. The former is a statement about O’s DNA and the latter is a statement about O’s behavior. Both of these will be described in more detail below, but for now it suffices to say that, semantically, the dene D is a truth-valued function of O’s DNA sequence and the bene B is a truth valued function of O’s temporal life-span.
For something that’s intended to “enable current and future research to move forward more effectively”, it sure doesn’t make anything clearer. I’d describe it as elegant, yet convoluted. Actually, it’s not really very elegant. More like convoluted, yet contrived. That pretty much sums it up.
This made me laugh:
It will be noticed that between genes and denes lies a difference of only one letter, yet, we argue, the latter, as we use it, designates a concept on an altogether different logical level.
I’m not sure that it was supposed to.
They introduce another letter: “S to denote the organism O’s complete DNA sequence”. Are we having fun yet? You do realize that “This is one of the manifestations of rich logical expressiveness.” You don’t? Neither do I.
Here are some more terms that will be incorporated into the model:
- E – environment
- M – internal mechanisms
- F, G – events
We’re running out of letters.
Next, the examples. Here’s a simple one:
The classical polypeptide coding unit found as a continuous stretch of DNA bounded by a stop and start codon (corresponding to Seymour Benzer’s cistron) is easily captured by a dene. The corresponding bene would be the production of a polypeptide chain (including transcription and translation), and the genitor would then capture the link between the two.
And here’s where they appear to not understand molecular genetics:
Alternative splicing: Here a dene refers to any set of mRNA transcripts sewn together to form a protein-coding unit. In general, the dene would not require these to be contiguous. Each such dene would be associated with the corresponding polypeptide, the genitor capturing the relation that specifies the components of the dene corresponding to the bene that specifies production of the polypeptide.
Alternative splicing isn’t a set of transcripts sewn together. It’s a single transcript with various parts removed. Depending on which parts are removed, different processed RNAs are produced. Those different RNAs may encode different proteins (I say “may” because the non-protein-coding exons may be alternatively spliced).
They also bring up different types of RNA (rRNA, tRNA, micro-RNA, siRNA, snoRNA, sbRNA, snlRNA) and claim, “In our terminology, defining the appropriate genitors that would relate such benes to particular denes in a clean and rigorous fashion is the crux of the problem.” Why not just use the names that have been given to them? I still don’t see what their point is.
But here’s one that really irks me:
Other segments of DNA without either an RNA or protein product are associated with yet different kinds of behavior. E.g., parts of the DNA, or various combinations thereof, which have a direct influence on mutation rates, would be definable as denes. A particularly interesting example of the latter is provided by stretches of small sequence repeats (SSR) that can induce slippage in the processes of replication, transcription, and even translation, and are accordingly sources of localized hyper-mutability (Moxon et al, 2006). Indeed, it has been argued that simple sequence repeats may equip the cell “with adjustable ‘tuning knobs’ for efficient adaptation” (King et al 1997).
What a load of adaptationist claptrap. These hyper-mutable repeats exist because they’re not deleterious enough to be purged by purifying selection. It’s very rare to find an coherent adaptationist explanation for tandem repeats.
As if we haven’t had enough already, Fox Keller and Harel tell us that they need to formalize their language. They claim:
In the good spirit of computational and systems biology, it is obvious that mathematical and algorithmic formalization of biological concepts has the advantage of — indeed is done for the purpose of — enabling computerized analysis.
Only they’re not going to do that. And if they don’t do it, I don’t think anyone will. Sure, they’ll explain how other people should do it, but they won’t do it themselves. So, we’ll move on to the Discussion.
The discussion starts out with some talk of modularity and phenotypic plasticity. This kind of stuff gives developmental biologists a hard-on. It is accompanied by more adaptationism:
Indeed, evolution selects for such adaptability (and hence, for modularity), because the ability to adapt that modularity confers on the organism enhances survival.
This is not a unanimous view, and not one that can be presented sans citation. Modularity can evolve neutrally (see here for a review). The authors are full of shit (this time the four letter word means exactly what you think it means). The requirement: small population size. That’s an assumption humans meet, with our complexified modular genome.
Fox Keller and Harel offer a closing salvo about how traits can be inherited via mechanisms other than DNA (i.e., RNA, proteins, etc.). This supposedly relates to their denes, benes, and genitors. Somehow. Whatever. I read this crap so you don’t have to (all apologies to Lenny Bruce). And if you haven’t, don’t. It’s not worth your time. It’s vacuous navel gazing. Content free scholarship. Nothing for no one’s sake. Okay, maybe it’s for shit’s sake.
Fox Keller E, Harel D (2007) Beyond the Gene. PLoS ONE 2(11): e1231. doi:10.1371/journal.pone.0001231