Once upon a time, I was one of those nerds who hung around Radio Shack and played about with LEDs and resistors and capacitors; I know how to solder and I took my first old 8-bit computer apart and put it back together again with “improvements.” In grad school I was in a neuroscience department, so I know about electrodes and ground wires and FETs and amplifiers and stimulators. Here’s something else I know: those generic components in this picture don’t do much on their own. You can work out the electrical properties of each piece, but a radio or computer or stereo is much, much more than a catalog of components or a parts list.
Electronics geeks know the really fun stuff starts to happen when you assemble those components into circuits. That’s where the significant work lies and where the actual function of the device is generated—take apart your computer, your PDA, your cell phone, your digital camera and you’ll see similar elements everywhere, and the same familiar components you can find in your Mouser catalog. As miniaturization progresses, of course, more and more of that functionality is hidden away in tiny integrated circuits…but peel away the black plastic of those chips, and you again find resistors and transistors and capacitors all strung together in specific arrangements to generate specific functions.
We’re discovering the same thing about genomes.
The various genome projects have basically produced for us a complete parts list—a catalog of bits in our toolbox. That list is incredibly useful, of course, and represents an essential starting point, but how a genome produces an organism is actually a product of the interactions between genes and gene products and the cytoplasm and environment, and what we need next is an understanding of the circuitry: how Gene X expression is connected to Gene Y expression and what the two together do to Gene Z. Some scientists are suggesting that an understanding of the circuitry of the genome is going to explain some significant evolutionary phenomena, such as the Cambrian explosion and the conservation of core genetic processes.
First, though, a little caveat. As in my introduction above, Davidson and Erwin use the language of electronic circuitry to explain what they are seeing. Metaphors are very dangerous things, and I have a feeling that they are pushing the metaphor a little too hard, at the risk of obscuring substantial differences between fluid gene interactions in a cytoplasm and a layout of wires and widgets on a circuit board. Metaphors are also powerful communication tools, an effective way to get an idea across, so I’m indulging in it here…just be warned, at some point we’ve also got to leave this metaphor behind and treat the epigenetic activity of the cell on its own terms. Just not today.
The idea of genetic circuitry refers to interactions between genes: how genes communicate with one another to turn on one pattern of gene expression in one cell, and turn on a different pattern in a different cell. For instance, while you want the keratin gene turned on in your skin cells (it makes the tough fibrous protein that makes our skin both supple and leathery), you probably want that gene inactivated in your red blood cells. The way this is done involves transcription factors and cis regulatory regions. The diagram below will help explain what these are.
There are certain kinds of genes in the genome that are, like other genes, transcribed into messenger RNA and translated into proteins, like gene X above, and their protein products are called transcription factors. Transcription factors are proteins that enter the nucleus of the cell and bind to DNA at specific sites—they typically recognize certain short sequences of DNA, for example, GCGTGGGCG. In the diagram, gene X produces a transcription factor, the red ball, that can bind to sequences near the genes A, B, C, D, and E, within the cis regulatory regions of those cells. The cis regulatory regions are simply sequences that aren’t actually part of the coding region of the gene, but are near it on the same strand of DNA (that’s what the “cis” part means), and control (the “regulatory” part) whether the transcriptional machinery of the cell will make messenger RNA copies of the gene. Gene X in this case is responsible for activating expression of genes A, B, C, and E, and turning off gene D.
The logic of regulating genes can get very tangled. Gene X will also have a cis regulatory region which is sensitive to the presence of other transcription factors, so its expression is also controlled. Some of its targets may also be transcription factors; gene C, for instance, could also code for a transcription factor, which could affect the expression of genes F, G, H, and I. Cis regulatory regions are rarely as simple as portrayed here, with one factor binding to them and simply controlling whether it is off or on—there will be many potential binding sites for many different transcription factors, and they will interact in complex and dosage dependent ways. Gene C might actually be turned on only if transcription factor X and transcription factor Y are present, or if transcription factor X or transcription factor Y are present, or if transcription factor X and transcription factor Y but not transcription factor Z are present. The interactions are typically wonderfully intricate examples of the same kind of Boolean logic you will find in computer programs and chip designs (I recommend the book Endless Forms Most Beautiful (amzn/b&n/abe/pwll) if you find this subject interesting—it has an excellent section introducing the basic ideas of regulatory logic.)
Much of molecular genetics is involved in teasing apart these patterns of interactions between genes. It’s not information you can simply extract from the gene sequence, but instead requires careful observation and painstaking experiments, examining patterns of gene expression over development and in animals with mutations that knock out transcription factors or modify cis regulatory regions. The work ends up with nicely tangled spaghetti diagrams of networks of genes, like the classic one from work on nematode vulva induction to the right, which are suggestive of circuit diagrams. It’s a short jump from there to calling these genetic circuits.
Eric Davidson has been a major figure in studying these kinds of circuits, focusing most of his work on the developing echinoderm. The echinoderm exhibits a pattern of stereotyped divisions in its early development to create a ball of cells, and that ball will eventually turn into the more elaborate larval form, with a gut and external epidermis and simple internal skeleton. In order to do this, different cells have to turn on different sets of genes—some cells have to activate the genetic circuitry to make endomesoderm, the red cells of the diagram at the left, while others have to turn off the endomesoderm circuit. What exactly is the endomesoderm circuit?
In a key part of that one circuit, there are six genes: Delta, Blimp1/Krox, Otx, Bra, Foxa, and Gatae. All except Delta are genes for transcription factors; Delta is part of a signal transduction pathway, that is, it receives signals from the environment and triggers changes in gene activity in the cell. All of the transcription factors have multiple targets, and one thing you can readily see here is that many of their targets are each other: this is a highly recursive network. An incoming signal from Delta triggers a whole mutually synergistic pattern of activity within a whole bank of genes.
Examples of putative GRN kernels. Networks were constructed and portrayed using BioTapestry software. Endomesoderm specification kernel, common to sea urchin and starfish, the last common ancestor of which lived about half a billion years ago. The relevant area of the sea urchin network is shown at the top; the corresponding starfish network is shown at the bottom. Horizontal lines denote cis-regulatory modules responsible for the pregastrular phase of expression considered, in endoderm (yellow), mesoderm (gray), or both endoderm and mesoderm (striped gray and yellow). The inputs into the cis-regulatory modules are denoted by vertical arrows and bars. The gray box surrounding the foxa input indicates that this repression occurs exclusively in mesoderm.
The diagram above illustrates two similar networks, one from sea urchin and one from starfish. There are differences, but most notable is their similarity—and these are two animals that have been separated evolutionarily for over a half billion years. Davidson and Erwin identify the common elements that have been conserved in both circuits, and note that the overall arrangement of elements is nearly identical. They call this kind of conserved circuit (note that what is being assessed is the pattern of interactions in addition to just the similarity in sequence of the genes) a Gene Regulatory Network kernel, or GRN kernel.
The network architecture, which has been exactly conserved since divergence—i.e., the kernel—for the endomesoderm specification kernel.
They identify several GRN kernels. Some of the properties of these kernels are that they are highly recursive and highly conserved—they are core molecular/genetic pathways that set up specific early domains of gene expression. They are conserved because they define basic ontogenetic processes: a mutation that disrupted the circuitry of the endomesoderm kernel, for instance, would mean that the population of cells that make endomesoderm could never be activated, and a whole broad set of tissue derivatives in the embryo would never form. Obviously, mutations can occur—the urchin and starfish circuits have differences—but mutations that perturb the general layout of the circuit are selected against.
Here’s another example of a kernel from Drosophila and a vertebrate. This is one that specifies where the heart will form in an animal; again, it’s function is very basic, but it is conserved because major changes would prevent the heart from forming at all.
Possible heart specification kernels; assembled from many literature sources. Dashed lines show possible interactions. Some aspects of the GRN that may underlie heart specification in Drosophila are shown at the top; the approximately corresponding vertebrate relationships are shown at the bottom. Absence of a linkage simply means that this linkage is not known to exist, not that it is known not to exist. Many regulatory genes participate in vertebrate heart formation for which orthologous Drosophila functions have not been discovered, and the hearts themselves are of very different structure. However, as pointed out by many authors, a core set of regulatory genes are used in common and are now known to be linked in a similar way in a conserved subcircuit of the gene network architecture, as shown.
This is an even more tangled circuit, and is also less well characterized than the endomesoderm pathways, but even so some common patterns fall out of it. The major players, like the fly tinman (tin) gene, have homologs in the vertebrate (Nkx2.5), the circuit is activated by homologous inputs (Dpp/BMP), have similar outputs (recruiting contractile proteins, for instance), and exhibit that familiar recursive topology.
(click for larger image)
The shared linkages of the heart specification pathway for vertebrates and invertebrates. The gray boxes represent in each case different ways that the same two nodes of the network are linked in Drosophila and vertebrates.
There are differences, though, that are highlighted in the grey boxes—the system still evolves. For instance, look at the central grey box showing a linkage from Tin/Nkx2.5 to Pnr/Gata4. In Drosophila, it’s simple: Tin activates Pnr directly. In the vertebrate, Nkx2.5 also activates a new intermediate, Gata6, which activates Gata4. This is clearly a case of gene duplication in a pathway, something I’ve described before.
What does it all mean? Davidson and Erwin propose that there are these highly conserved GRN kernels, which obtain their functionality from the particular linkages between the genes that make them up. It’s those connections that are important and maintained over evolutionary history, even as gene sequences may change and minor elaborations on the linkages may occur, and while the outputs of the circuit (which they call differentiation gene batteries) may be much more labile. GRN kernels are modular circuits associated with elements of the body plan, such as the specification of the heart, determination of the anterior-posterior and dorsal-ventral body axes, eyefield localization, and gut regionalization…and almost certainly many more. The authors predict that there will be a GRN kernel to specify the domain and initial developmental steps for each phylum-specific body part in the embryo. These circuits would have evolved prior to the Cambrian, and represent the formation of new modules that fueled Cambrian diversity but also subsequently imposed developmental constraints that suppressed the evolution of novel body plans thereafter. The price of success in the Cambrian radiation was reliance on a fixed set of proven modules. As the authors put it,
Critically, these kernels would have formed through the same processes of evolution as affect the other components, but once formed and operating to specify particular body parts, they would have become refractory to subsequent change. Molecular phylogeny places this evolutionary stage in the late Neoproterozoic when Bilateria begin to appear in the fossil record (47–51), between the end of the Marinoan glaciation at about 630 million years ago and the beginning of the Cambrian. Therefore the mechanistic explanation for the surprising fact that essentially no major new phylum-level body parts have evolved since the Cambrian may lie in the internal structural and functional properties of GRN kernels: Once they were assembled, they could not be disassembled or basically rewired, only built on to.
One significant thing about this explanation is that it is based solidly on molecular and developmental genetics; since these are fields that have only really taken off in the last 25 or 30 years, it represents a class of explanation that is not represented in classical neo-Darwinian theory. That’s good and novel and expected—trust me, biology has not been sitting still since Fisher and Wright and Simpson and Dobzhansky—but while it represents another example of an ongoing revolution in how development informs our understanding of evolution, it is no consolation to the creationists. Paul Nelson seems to think the exciting news here is that it highlights a deficiency in a theory formulated 60 years ago, in a post titled Neo-Darwinism Doesn’t Work for the Cambrian Explosion (ably knocked down at Deinonychus antirrhopus, so I won’t need to waste time on it), but that’s putting a phony spin on it. What it is is a mechanistic explanation and testable hypothesis for the patterns of animal morphology that we see in the Cambrian and afterwards…one soundly based on the data, and unfortunately for the Designists, one that is fully natural and requiring no assist from a designer of any kind. That’s how the authors conclude the paper:
We believe that experimental examination of the conserved kernels of extant developmental GRNs will illuminate the widely discussed but poorly understood problem of the origination of animal body plans in the late Neoproterozoic and Cambrian and their remarkable subsequent stability.
It’s a productive hypothesis that will fuel further research and analysis. It fits in perfectly with modern evolutionary biology, in which we emphasize genes as actors in a dynamic process…and that you can’t understand what’s going on by studying genes in isolation, but must follow how they interact with one another.
Davidson EH, Erwin DH (2006) Gene regulatory networks and the evolution of animal body plans. Science 311:796-800.