What is a Gene?

It’s not entirely obvious at first, but this article in the New York Times is about the problems with gene patents in a world where one gene does not equal one protein. Now, we’ve known that this model isn’t entirely correct, what with alternative splicing and all. Additionally, the human genome also contains many “genes” which are only transcribed into RNAs, but not translated into proteins. All of this has been pretty much accepted by geneticists for a few years.

But rather than putting all of this in the appropriate context, Denise Caruso muddies the waters by overemphasizing the importance of the recent ENCODE paper. At least, we think she’s writing about ENCODE, but we’re not entirely sure because here’s how she does it:

Last month, a consortium of scientists published findings that challenge the traditional view of how genes function. The exhaustive four-year effort was organized by the United States National Human Genome Research Institute and carried out by 35 groups from 80 organizations around the world. To their surprise, researchers found that the human genome might not be a “tidy collection of independent genes” after all, with each sequence of DNA linked to a single function, such as a predisposition to diabetes or heart disease.

First of all, the recent ENCODE paper was more of a proof a principle than anything else. A mere 1% of the human genome was studied in a intense detail — this was only a pilot project, after all — with more coverage expected in the future. That said, much can be learned from the research reported, especially about DNA sequence evolution and how gene transcription is regulated. But it’s not like there is anything in the paper that would represent a paradigm shift as described by Caruso.

Caruso then riffs off of this misunderstanding of the ENCODE findings by claiming that the validity of patenting genes should be questioned. I won’t be taking a position on gene patents, but I will point out that arguing against gene patenting based on ENCODE represents faulty logic. I say this because one need not even invoke any of the ENCODE results to point out that “genomes are complex” and that it does not make sense to patent a “gene”. The one gene one protein concept has been dead for years, and the non-independence of genes is hardly anything new either.

From here, Caruso goes on to discuss how antibiotics were created prior to biologists developing an understanding of how antibiotic resistance evolves. Somehow, this is related to the ENCODE findings (or gene patenting), as is the subsequent discussion of how recombinant DNA techniques allow us to insert individual genes into organisms. Caruso makes the point that genes which may be patented by different companies could interact in a network. I have no idea whether this is a valid argument against gene patenting, but Caruso does seem to string together a couple of paragraphs that actually contain a reasonable amount of scientific accuracy.

Alas, this could only continue for so long. She goes on to quote Barbara Caulfield, vice president of Affymetrix:

“We’re learning that many diseases are caused not by the action of single genes, but by the interplay among multiple genes,” Ms. Caulfield said. She noted that just before she wrote her article, “scientists announced that they had decoded the genetic structures of one of the most virulent forms of malaria and that it may involve interactions among as many as 500 genes.”

It’s more of the “genes don’t make sense outside of their interaction network” argument against gene patenting (which, I’ll point out once again, I am not taking a side on), but I can’t make heads or tails of the second part of the quote. I think she’s referring to this paper on Plasmodium falciparum protein interaction networks, but it takes a bit of digging to figure out what’s going on. In the end, Caruso’s argument may make sense, but it’s damn near impossible to evaluate it because it’s hidden behind a veil of muddled science and tangential points.

Who is Denise Caruso? Aside from writing the Re:Framing column for the Times, she is also the founder and director of the Hybrid Vigor Institute. Caruso’s institute claims to help with problem solving by encouraging cross-disciplinary collaborations. Just like her writing, however, I can’t figure out what the hell that means. I worry that, despite the fact that she may be drawing upon diverse areas to create novel collaborations, Caruso can’t make the necessary connections to really bring them together. At least, that’s the conclusion one draws from reading her column.


  1. #1 Mike the Mad Biologist
    July 1, 2007

    According to the disclosure page, Hybrid Vigor is supported by “the New York Times Capital Foundation, for a grant to support general operations.” Interesting.

  2. #2 Jonathan Vos Post
    July 1, 2007

    That New York Times article made very little sense to me. I grew up reading the New York Times. I know a little bit about biology, and my wife and I have earned over $100,000 consulting to Patent Law firms in topics including biotechnology.

    First impression on actually reading the ENCODE paper follows, which also benefits from conversations with Dr. George Hockney.

    This is a paradigm smashing paper in the current issue of Science, about the ENCODE project.

    Bottom line: in the Human Genome at least (and other projects show that this applies to Drosophila melanogaster as well), there are (for the most part) NO SUCH THINGS AS GENES.

    The word is still used, with a mass of epicycles encrusted onto the concept so that it takes a grad school semester to even define “gene” any more.

    But just as we don’t know what holds the galaxy together (i.e. the epicycle “dark matter”) we don’t know what holds the genome together (i.e. the epicycle “heterochromatin”).

    The model that came from Morgan et al at Caltech in the 1930s was: one gene, one enzyme.

    That is, the chromosome is mostly DNA, and certain substrings of the DNA code for proteins. They evolve by Natural Selection. Some other parts regulate. The rest is non-functional, noise, or junk, or outside the paradigm, and never gets transcribed to RNA nor has function nor is selected.

    The actual DATA using the latest methodologies in combination, applied to 1% of the human genomne, partly bits we know, partly bits chosen at random, is, to the contrary:

    2% to 4% is crudely akin to the “genes” and “pseudogenes.” More than half is functional, more than half ends up copied to RNA, the RNA has some dynamic interaction with other RNA and protein in ways we don’t know, much of the functional stuff (once called genes) are selectively neutral (or only very weakly subject to natural selection). Things we don’t understand are sometimes evolving by natural selection. Functional things are sometimes not strongly conserved. Strongly conserved things are sometimes not functional. There is large-scale structure correlated with when in the reproductive cycle the cell is. The performers formerly known as genes are broken into pieces, scattered, scrambled, started and stopped by things far away on the chromosome in both directions, and overlapping.

    There is no “vacuum.” There is no “gene.” The words do more harm than good.

    The truth is out there.

    I exaggerate for rhetorical reasons, but this really is revolutionary work I’m reading.

    In the sense of Mendel, there are such things as mathematical rules about discrete units of inheritance. But, you’re right, the last straightforward link to DNA is broken.

    To recapitulate (with vast oversimplification) the history of the key
    concept [reference Nature, 25 May 2006, p.400]:

    1860s: Gregor Mendel, Austrian monk, plays with pea plants, fudges data, publishes in most obscure place (slowing down recognition): basic rules of inheritance defined; traits determined by deterministic
    units passed from one generation to the next, God knows how.

    1909: Wilhelm Johanssen, Danish botanist, coins word “gene” for the unit associated with an inherited trait, admitting that the physical basis is unknown.

    1930: Thomas Morgan (enjoying the monastic atmosphere of Caltech) analyzes why time flies like an arrow but fruit flies like a banana, and concludes that genes sit on chromosomes, an idea popularized as beads on strings. Turns out as accurate as image of atoms being electron planets orbiting sun nuclei.

    1941: George Beadle and Edward Tatum launch the model that one gene makes one enzyme. The classical enzymology yields a PhD for Isaac Asimov, and (when seen through the not-yet-named fields of Artificial Life and Nanotechnology) a neither granted nor denied PhD for Jonathan Vos Post.

    1944: Oswald Avery, Colin Macleod, and Maclyn McCarty show that genes are made of DNA. This raises more questions than it answers.

    1953: James Watson and Francis Crick find the golden spiral stairway to heaven, publishing the structure of DNA in a sneaky race against Pauling (they ply Pauling, Jr., with sherry to find out what Linus,
    Sr., is up to) and denying the essential contribution of Rosalyn Franklin and others; the central dogma of molecular biology comes from this: information flows from DNA to RNA to protein. That’s all ye need to know.

    1970: Reverse transcriptase was discovered by Howard Temin at the University of Wisconsin-Madison, and independently by David Baltimore, who later is Caltech President. Information can flow from RAN to DNA, against Dogma.

    1977: Richard Roberts and Philip Sharp discover that genes can be split into segments, leading to the idea that one gene can make several proteins.

    1993: The first microRNA is identified in the worm Caenorhabditis elegans; the worm turns.

    2003: GeneSweep: Human geneticists yell at each other late into the night, hammering a compromise definition of protein-coding genes, in order to decide who won the bet on the number of human genes. The winner is announced, but the consensus is that we have no idea what the real answer is. ["gene = locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions."]. Yeah, right.

    2006: the paradigm begins to emerge that human genes are one long continuum.

    2007: ENCODE supports the new paradigm much more than the old.

    2008: High school student complain that what Jonathan Vos Post is teaching as full-time instructor in the Blair International Bacchelaureate Magnet High School Health Careers Academy seems
    disconnected from all their textbooks. His supervisor and the principal takes Jonathan’s side…

    – Prof. Jonathan Vos Post

  3. #3 qetzal
    July 1, 2007

    Wow. I don’t think I’ve ever read an article in a mainstream newspaper that is more off-base than that one.

    The presumption that genes operate independently … is the economic and regulatory foundation on which the entire biotechnology industry is built.

    No it isn’t. Not even a little bit.

    The scientists who invented recombinant DNA in 1973 built their innovation on this mechanistic, “one gene, one protein” principle.

    No, they didn’t. They built their innovation on a set of techniques that allows different pieces of DNA to be connected together in precise ways. Whether one gene codes for one protein, many proteins, or no proteins matters not a whit.

    The principle that gave rise to the biotech industry promised benefits that were equally compelling. Known as the Central Dogma of molecular biology, it stated that each gene in living organisms, from humans to bacteria, carries the information needed to construct one protein.

    Not only is biotech not dependent on the Central Dogma, she doesn’t even get the Central Dogma right! The CD says that information flows in one direction, from DNA -> RNA -> protein. It’s not the same as “one gene, one protein.” (And, of course, we’ve known that the CD is wrong for decades.)

    Because donor genes could be associated with specific functions, with discrete properties and clear boundaries, scientists then believed that a gene from any organism could fit neatly and predictably into a larger design — one that products and companies could be built around, and that could be protected by intellectual-property laws.

    I strongly doubt that most scientists believed that about all genes. Although the fact is, many genes do fit neatly and predictably into a larger design. People have built products and companies around genes. Amgen, Genentech, Genzyme, these are not obscure companies.

    “The industrial gene is one that can be defined, owned, tracked, proven acceptably safe, proven to have uniform effect, sold and recalled,” said Jack Heinemann, a professor of molecular biology in the School of Biological Sciences at the University of Canterbury in New Zealand and director of its Center for Integrated Research in Biosafety.

    In the United States, the Patent and Trademark Office allows genes to be patented on the basis of this uniform effect or function.

    Wrong again. There is absolutely nothing that says a gene must have a uniform effect in all species, or even in all humans, to be patentable.

    If genes are only one component of how a genome functions, for example, will infringement claims be subject to dispute when another crucial component of the network is claimed by someone else? Might owners of gene patents also find themselves liable for unintended collateral damage caused by the network effects of the genes they own?

    No, and no. Incredibly ignorant questions, suggesting a profound lack of understanding about patents.

    Even more important than patent laws are safety issues raised by the consortium’s findings. Evidence of a networked genome shatters the scientific basis for virtually every official risk assessment of today’s commercial biotech products, from genetically engineered crops to pharmaceuticals.

    More BS, verging on deliberate irresponsibility. The safety of pharmaceuticals is never predicated on an assumption that genes operate independently. We know genes don’t operate in a vacuum, and a gene’s effects can depend on context. We’ve known that even longer than we’ve known that genes are made of DNA.

    The safety of pharmaceuticals is based on empirical testing, typically starting with cultured cells or in vitro studies, followed by animal studies in multiple species, followed by human studies. That’s precisely because we know we can’t predict safety based on some 1st grade notion of independently acting genes.

    “Because gene patents and the genetic engineering process itself are both defined in terms of genes acting independently,” [Professor Heinemann] said, “regulators may be unaware of the potential impacts arising from these network effects.”

    Wrong, wrong, and wrong. I hope, for Prof. Heinemann’s sake, that this is some sort of misquote.

    [I]t may be time for the biotech industry to re-examine the more subtle effects of its products, and to share what it knows about them with regulators and other scientists.

    OK, so the next time a biotech co. develops a new drug, they should what, submit a bunch of gene chip data to FDA? What are they gonna do with it? Nothing, because we don’t know how to correlate that info with actual biological effects. Not yet, and not in most cases, anyway. That’s why we still study safety in clinical trials.

    Denise Caruso may be great at directing an institute that studies collaborative problem-solving, but she doesn’t have the first clue what she’s talking about here.

  4. I don’t think it is possible to overemphasize the ENCODE results. Not because the 1+29 papers broke big news – scientist during those 4 years when ENCODE was in its making grew increasingly convinced that the fundamental axioms (genes/junk) are dogmatic, see http://www.junkdna.com

    The “news” with ENCODE is that it is now “official” that the “junkDNA”, “genes” and some fundamental assumptions of the Lamarck/Darwin concepts have to be re-thought.

    International PostGenetics Society htpp://www.postgenetics.org formally abandoned the dogmas in the European Inaugural (12 October, 2006).

    After the pre-classical era of Genetics (Mendel and Darwin), Bateson started Modern Genetics (1905); which made room for Molecular Genetics starting with Watson & Crick (1953).

    Modern Genetics is now history. Welcome to the PostModern era (PostGenetics; “Genomics beyond Genes”).

    The implications are formidable – it will even take a new crop of science journalists to communicate them to society at large.


  5. #5 Peter Ellis
    July 2, 2007

    Saying “functional elements are not conserved” is wrong. What we can say is that elements which appear to be functional are not conserved [i]at the sequence level[/i].

    If something is functional, and that function is important for the reproductive success of the organism, then mutations which affect the function *will* be selected against. That’s not a deduction, it’s a tautology.

    So functional elements are conserved by definition.

    The correct conclusion to draw from the observation that functional elements appear not to be conserved at the sequence level is that the function is not dependent on the sequence! Nothing more, nothing less.

    There could be a number of explanations, which are laid out in the various papers and review of the ENCODE project.

    For regulatory regions, a possibility is that we’re just not very good at identifying what’s conserved. The function that’s being conserved could be “binding of transcription factor X”. In that case, it may be that all that’s required is that a given 6-base motif is present somewhere within 1000 bases of the transcription start site. You can lose one through mutation as long as you acquire another one somewhere in the right region. Neither the old one nor the new one will look to be conserved when you compare them against the matching location in other species, since both have changed.

    Another possibility is that what’s being conserved is the secondary structure of the DNA (or encoded RNA) rather than the sequence itself. Say a given structure folds into a hairpin. You can change it as much as you like, as long as you make a corresponding change on the other side of the hairpin. Again, the sequence is not conserved, but the function is.

    None of this is revolutionary. It’s only revolutionary if you think that proteins are the only functional gene product. Given the existence of ribosomal RNAs, transfer RNAs, etc. (not to mention all the others we’ve discovered since), that’s not [i]ever[/i] been a sensible position to hold.

  6. #6 dubious
    July 2, 2007

    You’ve been PELLIONISZED!

  7. #7 squarepeg
    July 2, 2007

    “Wow. I don’t think I’ve ever read an article in a mainstream newspaper that is more off-base than that one.”


    “None of this is revolutionary. It’s only revolutionary if you think that proteins are the only functional gene product. Given the existence of ribosomal RNAs, transfer RNAs, etc. (not to mention all the others we’ve discovered since), that’s not [i]ever[/i] been a sensible position to hold.”

    Amen. I hate it when non-scientists write about “revolutionary” findings in science, then get the SCIENCE WRONG.

    This article drove me batty.

  8. #8 qetzal
    July 2, 2007

    Amen. I hate it when non-scientists write about “revolutionary” findings in science, then get the SCIENCE WRONG.

    Yeah, but you gotta hand it to Caruso for going above and beyond. She wasn’t content to get just the science wrong, she made sure to get the business (biotech) and law (patents) wrong, too!

  9. #9 Kevin Z
    July 4, 2007

    Hey RPM,

    You might have heard John Grealy on NPR’s Science Friday talk about the ENCODE project and “junk DNA”, aired June 15. If not you can access the podcast here: http://www.sciencefriday.com/pages/2007/Jun/hour1_061507.html

  10. #10 Gabriel Haro
    July 4, 2007

    I agree that it’s easy to misunderstand Caruso’s arguments, especially if one takes a linear perspective towards them. I would have definitely preferred more background on the ENCODE project, but given the nuance of the science involved, perhaps her tactic of stating that the landscape (pun intended) is more complex than public policy reflects is more appropriate for the Times.

    For one thing, Caruso’s argument doesn’t seem to be based at the molecular level at all. Caruso takes a population level perspective that’s needed to recognize and understand what is relevant to ownership and commodification of genetic processes. Genomics is only the vector; what she’s really talking about is capitalism.

    Though it’s not clear if she is using “network effect” to refer to intragenomic interactions or intergenomic ones, her examples of bacterial resistance and malaria suggest that she’s referring to intergenomic interactions. My take on what she is reacting to is the observation that genetic interactions have many epistatic or non-linear effects while the prevalent assumption for biotech and policy-makers is that that genes are predominantly additive and that a predictable relationship exists between gene identity and outcome.

    Starting with popular science is enough. The way genes are portrayed in popular culture suggests that there are genes for heart disease and genes for aggression and so on. That’s how biotech gets funded, no? By stipulating that specific genes have appreciable effects on health, the value of those genes can be measured, built, and sold as a product. This is misleading. Yes, we can associate disease variation with specific loci, but it’s is never the case that genes cause anything. Genetic material is one component of a very non-linear system that includes developmental timing and environmental interactions. As every evolutionary geneticist knows, selecting on a single trait often results in correlated responses across many other traits. Given that traits are based on the interactions of many genes, moving genes among individuals doesn’t bring the whole system along in the manner that, say, artificial selection does.

    By stating that a gene has a distinct function, we are in essence naming it and categorizing it according to that function. For genes to be patented, a recognizable function has to be ascribed to them. We can say that the “terminator gene” has a protein binding function, but can’t we also say that the “terminator gene” also has a social unrest function if we expand our observations beyond the lab? Though we can’t directly establish a cause and effect relationship between large-scale social interactions and political protests and the gene, we know intuitively that the effect of of this transgene isn’t limited to corn or cotton.

    It’s true that the science isn’t particularly new. What’s new is that people are starting to ask relevant questions about how the ownership and practices of industry takes into account the mechanistic possibilities for creating value as well as the relevant downstream biological and social process. This is not something that any single individual or profession can either validate or invalidate.

    I’ll agree that “the economic and regulatory foundation on which the entire biotechnology industry is built” is probably a mix of a lot of different factors. But why isn’t it based on “the presumption that genes operate independently”? When was the last time you heard a company say that a disease was attributed to many genes and only in certain contexts and we’re not entirely sure how and when, but please still buy our product? People want certainty and hope, especially when their health is involved. Reducing the message down to single genes does that. All that Caruso is saying is that this may mean bigger problems down the line if we don’t actually revise our policy and language to match what we actually know about the world.

    BTW, what does empirical testing of pharmaceuticals (GxE) have to do with empirical testing of gene interaction (GxG) effects, and even if it did how do these protocols take into account non-human community/ecosystem interactions?