One of the greatest developments of the post-genomic era has been the refinement of the concept of the ‘gene’. The central dogma states that genes encode RNA transcripts which are translated into the amino acid sequence that makes up a protein. But protein coding genes make up a small fraction of many genomes, so what does the rest of the genome do? Some say it’s junk. Others say that it’s involved in regulating the transcription of the other regions. And even others say that it’s transcribed, but not translated. (Note: most think it’s some combination of the three.)
We’re now discovering that lots of those non-protein-coding regions are actually transcribed into RNAs. Those RNAs may be transcriptional mistakes or they can be functional non-protein-coding RNA. A recent study in Drosophila melanogaster revealed that nearly 30% of the transcribed sequence has not yet been annotated as a functional transcript. Only 29% of those unannotated sequences can be explained as alternative exons of known genes. The other 71% are either unknown protein coding genes or a bunch of untranslated RNAs that need to be characterized (I’d put my money on the latter).
In classical genetics, a gene is a region of the genome that, when mutated, produces an observable phenotype. What would you say is the post-genomic definition of a gene?