You would think that geneticists would have a good definition of “gene”. After all, genes are what we study. In introductory biology courses, you may have been introduced to the concept of the gene as the unit of heredity. That’s all well and good, but when you begin to study genes at a molecular level (i.e., looking at DNA sequences), that definition ceases to be practical. The advent of DNA sequencing led to the concept of the gene as an open reading frame, and the post-genomic era has challenged the very idea of the gene.
I’ve previously discussed the definition of gene (What is a gene?, What is a gene? — yes, two different posts with the same title), but I didn’t get into very many details. Alas, I don’t feel like spending much time laying out my opinion, suffice it to say I think “gene” is an obsolete, overly generic term that should be replaced by a more specific term whenever possible. Luckily, the New York Times has published an article by Carl Zimmer sketching out some of the possible interpretations (Now: The Rest of the Genome ). This lets me pick and chose my favorite meaning from a variety of opinions represented in Carl’s piece.
A lot Zimmer’s article deals with the results from the pilot ENCODE project. One part of the project was a careful examination of which DNA sequences are transcribed into RNA. This led to some remarkable findings, including the discovery that a lot of transcripts consist of sequences encoded in different parts of the genome:
Encode’s results reveal the genome to be full of genes that are deeply weird, at least by the traditional standard of what a gene is supposed to be. “These are not oddities — these are the rule,” said Thomas R. Gingeras of Cold Spring Harbor Laboratory and one of the leaders of Encode.
A single so-called gene, for example, can make more than one protein. In a process known as alternative splicing, a cell can select different combinations of exons to make different transcripts. Scientists identified the first cases of alternative splicing almost 30 years ago, but they were not sure how common it was. Several studies now show that almost all genes are being spliced. The Encode team estimates that the average protein-coding region produces 5.7 different transcripts. Different kinds of cells appear to produce different transcripts from the same gene.
Even weirder, cells often toss exons into transcripts from other genes. Those exons may come from distant locations, even from different chromosomes.
So, Dr. Gingeras argues, we can no longer think of genes as being single stretches of DNA at one physical location.
“I think it’s a paradigm shift in how we think the genome is organized,” Dr. Gingeras said.
Another highly touted finding from ENCODE was that the majority of the genome is transcribed. This led some people to conclude that much of the genome consists of undescribed functional elements.
These discoveries left scientists wondering just how much noncoding RNA our cells make. The early results of Encode suggest the answer is a lot. Although only 1.2 percent of the human genome encodes proteins, the Encode scientists estimate that a staggering 93 percent of the genome produces RNA transcripts.
John Mattick, an Encode team member at the University of Queensland in Australia, is confident that a lot of those transcripts do important things that scientists have yet to understand. “My bet is the vast majority of it — I don’t know whether that’s 80 or 90 percent,” he said.
That would mean the human genome is chock full of genes. However, just because something is transcribed does not necessarily mean that it is functional. Many sequences may be aberrantly transcribed, representing merely background noise. That is, a lot of the potential “genes” aren’t really genes at all. Of all the people quoted in the article, I find myself agreeing with Ewan Birney and David Haussler the most:
Despite the importance of noncoding RNA, Dr. Birney suspects that most of the transcripts discovered by the Encode project do not actually do much of anything. “I think it’s a hypothesis that has to be on the table,” he said.
David Haussler, another Encode team member at the University of California, Santa Cruz, agrees with Dr. Birney. “The cell will make RNA and simply throw it away,” he said.
Dr. Haussler bases his argument on evolution. If a segment of DNA encodes some essential molecule, mutations will tend to produce catastrophic damage. Natural selection will weed out most mutants. If a segment of DNA does not do much, however, it can mutate without causing any harm. Over millions of years, an essential piece of DNA will gather few mutations compared with less important ones.
Only about 4 percent of the noncoding DNA in the human genome shows signs of having experienced strong natural selection. Some of those segments may encode RNA molecules that have an important job in the cell. Some of them may contain stretches of DNA that control neighboring genes. Dr. Haussler suspects that most of the rest serve no function.
We’re still left without a concrete definition of a gene, which leads me back to my original conclusion: we should simply abandon the term when dealing with anything beyond simple classical genetics. The gene is far too general, and more specific terminology is warranted in most cases. And I haven’t even touched on the importance of epigenetics (i.e., heritable chromatin modifications, DNA methylation, etc.) and how that affects our definitions.