We now have a draft of the sea anemone genome, and it is revealing tantalizing details of metazoan evolution. The subject is the starlet anemone, Nematostella vectensis, a beautiful little animal that is also an up-and-coming star of developmental biology research.

(click for larger image)

Nematostella development. a. unfertilized egg (~200 micron diameter) with sperm head; b. early cleavage stage; c. blastula; d. gastrula; e. planula; f. juvenile polyp; g. adult stained with DAPI to show nematocysts with a zoom in on the tentacle in the inset; h, i. confocal images of a tentacle bud stage and a gastrula respectively showing nuclei (red) and actin (green); j. a gastrula showing snail mRNA(purple) in the endoderm and forkhead mRNA (red) in the pharynx and endoderm; k. a gastrula showing Anthox8 mRNA expression; l. an adult Nematostella.

A most important reason for this work is that the anemone Nematostella is a distant relative of many of the animals that have already been sequenced, and so provides an essential perspective on the evolutionary changes that we observe in those other organisms. Comparison of its genome with that of other metazoans is helping us decipher the likely genetic organization of the last common ancestor of all animals.

For instance, we have sequenced the Drosophila genome and the human genome, and we can compare them and identify commonalities — many commonalities. It’s actually been a pleasant surprise to find so much unity. We can examine homologous genes and see, for example, that both flies and people have similar genes belonging to a family called Wnt. It’s highly unlikely that the two lineages would have independently come up with genes of such similar sequence and structure, so we can infer something about the last common ancestor of both flies and people: they had Wnt genes. By examining the features that are similar, we can sort out the least common denominator and figure out what pieces of the genomic puzzle had to be present in the last common ancestor.

What about the differences? That’s trickier. We do not have any direct way to examine the genetic complement of the last common ancestor, and both flies and people have been independently evolving for equal lengths of time, adding and losing and modifying genes, so it’s difficult with small samples to figure out what happened in evolution to create differences. For instance, Flies lack a particular member of the Wnt family, a gene called Wnt8. Humans have a copy of that gene. Does that mean that Wnt8 is a human innovation and that our line evolved this new gene, or does it mean that flies lost their copy of the gene? We can’t tell from just looking at two genomes, so what we do is examine a third. If an anemone has a copy of Wnt8, that would imply that the ancestral, or primitive (in a non-pejorative sense) condition was to have Wnt8, and that flies are specialized or derived, and that their lineage has lost Wnt8. Conversely, if anemones also lack Wnt8, that may mean that Wnt8 is an evolutionary innovation in our lineage; it could also mean that, since anemones have not been standing still since they diverged from our line, they might also have secondarily lost the ancestral Wnt8. In that last case, we’d want to look at many different organisms to discover the pattern of loss.

It’s all a kind of complicated logic puzzle, where the information about the ancestral form has been modified and degraded and expanded upon over millions of years of evolution, but we have many different lineages that have modified that information in different ways. By combining the information from these various lineages, we can partially reconstruct the ancestral pattern. As you might guess, having genomic information from two closely related species of flies does not tell you as much as having information from two distantly related animals, which is why the genome of Nematostella is going to be so useful—this is a very distant relative to most of the animals with which you are familiar, having branched off the family tree between 600 and 700 million years ago. Anything that is shared in its genome with other phyla of the eumetazoa is likely to have been present in that dim, distant pre-Cambrian ancestor of us all.

In the case of Wnt8, the situation is that both anemones and vertebrates have a copy, and the fruit fly is the odd man out, so we infer that the common ancestor had Wnt8. That makes our possession of that gene a primitive trait, and the absence of the gene in the fly a derived trait. Every species is going to be a mixture of primitive and derived characters; we also have attributes that are obviously very different from what was present in our pre-Cambrian ancestor, so in other measures we would be considered highly derived. One of the general conclusions of the work with Nematostella is that, in a slap to our egos, humans are actually fairly primitive in gene structure and organization, and retain many more genetic attributes of the last common ancestor of the metazoa than do flies — flies are looking ever more radical and weird, the fast innovators of the multicellular world.

So what have the authors learned about anemone genomes?

This is a preliminary analysis of the Nematostella genome. They have neither a physical nor a genetic map yet—what they have is the complete 357Mb genome in chunks which have been individually sequenced but have not yet been assembled into the complete sequence. They do have data on all the genes that are present (they estimate that it contains approximately 18,000 protein coding genes, and so is comparable to our own genome), and the chunks, called scaffolds, are large enough that they can get a picture of the relationship of genes to one another, and can also analyze synteny, or the organization of groups of genes, with respect to other species. Half the genes are in scaffolds containing at least 48 genes, so that isn’t unreasonable.

I mentioned that one goal of this kind of analysis is to assemble a picture of the genes shared in common by members of the metazoa — these would represent part of the ancestral set of genes found in the last common ancestor. The authors pulled out members of gene families that were found in Nematostella and were also found in one or more of the fly, nematode, human, frog, or pufferfish genomes (that list is unfortunately heavy on the vertebrates, but that’s what we’ve got to work with right now—we need more diverse genomes in the databases!). They identified a total of 7,766 ancestral gene families. The ancestral gene would have expanded by duplication events in each lineage, so that represents 12,319 genes in modern Nematostella and 13,380 genes in modern humans, or that about two thirds of our genes are straight out of the ancient metazoan toolbox, and less than one third, in both the anemone and us, are later additions.

Individual lineages lost genes during evolution, so the estimate of the ancestral metazoan genome is very rough and is an underestimate. If an ancestral gene had been lost in the anemones, for instance, but had been retained in vertebrates, it would not appear in their tally. The ecdysozoan lineage, represented by flies and nematodes, seems to have been particularly prone to lose genes over their history. Out of those ancestral 7,766 genes, both flies and nematodes have lost 1,292; those genes are shared in anemones and vertebrates, but not in flies or nematodes. In contrast, the vertebrate lineage has lost 33 of the 7,766. We’ve been relatively conservative in retaining genes, while the ecdysozoa have been paring their genomes down.

In another measure of change, the diagram below illustrates two things: the branching pattern will help remind you of the phylogenetic relationships among these various organisms in case you’d forgotten, and the length of the lines represents the relative amount of amino acid substitutions in each lineage in a shared subset of 337 single copy genes. Flies and nematodes have been busy little critters, with many more substitutions; anemones and vertebrates have been relatively pokey and conservative.

Bayesian phylogeny of Metazoa. Bayesian analysis infers metazoan phylogeny and rate of amino acid substitution from sequenced genomes based on 337 single-copy genes in Ciona intestinalis (sea squirt), Takifugu rubripes (fish), Xenopus tropicalis (frog), human, Lottia gigantea (snail), Drosophila melanogaster (fly), C. elegans (nematode), Hydra magnipapillata (hydra), Nematostella, Amphimedon queenslandica (sponge), Monosiga brevicollis (choanoflagellate), and Saccharomyces cerevisiae (yeast). All nodes were resolved as shown in 100% of sampled topologies in Bayesian analysis. The scale bar indicates the expected number of amino acid substitutions per aligned amino acid position. E, the eumetazoan (cnidarian-bilaterian) ancestor; B, the bilaterian (protostome-deuterostome) ancestor. The number of new genes (+), genes created by gene duplication (d), and the total number of reconstructed ancestral genes of the recent common ancestor (N) are labeled for S1 and S2, the eumetazoan and bilaterian stems, respectively.

The message so far is that at the genomic level, people are more like anemones than they are like flies. That’s counterintuitive, and it also seems to contradict the observation that flies and people are phylogenetically closer to one another than they are to anemones. What seems to be going on is that people and anemones have been evolving at a regular pace, steadily diverging from one another for the past three quarters of a billion years, while arthropods and worms have been modifying their genomes at a more hectic rate. While we’ve been separated for the same length of time, flies have been moving more quickly. Several of the analyses reinforce this observation, that vertebrates and anemones have conserved more of the ancestral genome, while flies and nematodes have shed more.

The diagram below illustrates the position of introns in a few select proteins. Introns are segments of noncoding DNA that interrupt the sequence of a single gene and need to be excised later; they don’t really affect the function or sequence of the final gene product, and can be thought of as arbitrary, non-functional intrusions of useless DNA. They do have one use to us, though—they are a handy marker of evolutionary accidents, because they are conserved to a degree. Since they don’t really seem to do much and are typically snipped out by the cell when the gene is expressed, they aren’t strongly selected against in most cases.

On the left, for instance, the authors have diagrammed a gene called Rab 1 as it is found in anemones, humans, sea squirts, flies, nematodes, a fungus, and a plant. Rab1 is a GTPase that regulates vesicle traffic in the cell; it’s an old, old gene that all of these organisms share (as is the case for all of the genes in this illustration), and as you can see, even the structure of its gene is shared—all of them have the same intron at the same place. The second gene, GLT28D1, also has an intron, but this intron has been lost in flies and nematodes.

I have to make an important aside here: assessment of overall homology cannot be made on the basis of a single detail of a single gene. We can find lots of individual instances, like with GLT28D1, that if all you did was look at intron structure in this one gene, you’d say that people are more like mustard plants than they are like another animal, a fly. Each species has unique attributes and it is a mistake to focus on one minor difference as a measure of relationships.

Another example is in the third gene, SRP 54. This one has an intron in the same place in anemones, humans, and mustard plants; the fourth gene, SAP 155, has a homologous intron in only humans and anemones, and all others have lost it!

(click for larger image)

Patterns of intron evolution in eukaryotes. Examples of different patterns of intron gain and loss. Bars of the same color represent conserved regions across all species. Chevrons indicate introns and the number below the chevron shows the phase of the intron.

Again, you can’t make a whole family tree with any accuracy from a single attribute — it would be like trying to put together a human genealogy from just hair color, setting aside blondes as one related grouping. We have to look at the whole pattern and multiple characters. Here, for instance, is a big picture diagram of the pattern of intron gains and losses in these different lineages, and what we see is that some organisms, especially the fly and nematode, have exhibited a pattern of predominant intron loss over their history. By comparison, humans and Nematostella are packrats who rarely throw away introns, and show a greater pattern of intron gain.

Patterns of intron evolution in eukaryotes.Branch lengths proportional to the number of inferred intron gains (left), and intron losses (right) under the Dollo parsimony assumption that introns with conserved position and phase were gained only once in evolution. The bottom scale indicates the change in intron number for gains (left) and losses (right), relative to the inferred introns of the eumetazoan ancestor. Based on a sample of 5175 introns at highly conserved protein sequence positions from Arabidopsis thaliana (plant), Cryptococcus neoformans (fungus), C. elegans (nematode), D. melanogaster (fly), C. intestinalis (sea squirt), Homo sapiens (human), and Nematostella.

The fact that anemones and vertebrates have been slower to modify the structure of their genomes than flies explains this next dramatic observation: the conservation of synteny between anemones and humans. Synteny refers to preservation of small neighborhoods of genes within the genome; that we can look in organism after organism, and always find genes X, Y, and Z next to one another in order even though they may have no functional relationship to one another in the cell, and even though the overall chromosome structure may have been spectacularly scrambled between two species with translocations and inversions and duplications and deletions.

We can map synteny onto existing chromosomes. Basically, what that means is that we can look and see the X-Y-Z genes on chromosome 6 of a mouse, for instance, and X-Y-Z on chromosome 11 of a human, and we can surmise that there was a translocation at some point in the divergence of those two species. In good cases, we can even reconstruct ancestral genomes, right down to estimates of the number of chromosomes and the location of genes on those chromosomes, for an ancestral species that doesn’t exist anymore. Being able to do that depends on how much shuffling and scrambling of the genome has been going on. We can see lots of conserved synteny between vertebrates, for example, but insects and vertebrates have diverged too much, and the arrangements of genes have been randomly rearranged to too great an extent to see a pattern anymore.

Humans and anemones, as has just been shown, have not been modifying their genomes as rapidly as flies. If the genome is a deck of cards, we’ve been slowly making a few cuts now and then, while flies have been doing a brisk and efficient and frequent riffle shuffle. That means we might still be able to see traces of the ancestral order in a comparison of anemone and human genes.

And that’s exactly what we find. The chunks of Nematostella genome were compared against the human genome and grouped into classes called Putative Ancestral Linkage groups, or PALs — these are neighborhoods of genes that are likely to have descended from a single chromosome in the last common ancestor. They identified 13 PALs, the twelve most defined of them lettered from A-L and colored red in the diagram below, with the thirteenth being a bit fuzzier and colored green (bars colored white show no visible conserved synteny). These PALs are aligned with a diagram of the human chromosome set below so you can see how all the bits and pieces line up.

(click for larger image)

Conserved synteny between the human and anemone genomes. The human genome, segmented into 98 regions whose linkage has not been broken during chordate evolution. Colored segments indicate statistically significant conservation of linkage between human and Nematostella. Red segments are members of the 12 compact PALs labeled A to L. Green segments fall into the diffuse 13th PAL. White segments do not show significant conservation of linkage.

This is an amazing amount of conservation of genome structure. 30% of the Nematostella genes that are on scaffolds large enough to be used in this analysis fall into one of the conserved linkage groups.

In another example of the conservative nature of evolution, the authors categorized the origin of eumetazoan genes. It is no surprise that 80% were ancient in origin, appearing before the origin of the metazoa. We are just glorified bacteria, after all, and most of what our individual cells have to do is identical to what yeast have to do.

15% are unique to animals. That just means that no homologs have been found in plants or fungi or ciliates or bacteria … but most of those genes are probably also very old, and evolved in the single-celled ancestors of the metazoan line.

2% are modified versions of ancient genes that have an added, novel domain. 3% are constructed by fusions and domain shuffling of parts of ancient genes to make a new hybrid.

The diagram below illustrates one example of a collection of genes that make a functional part of one signaling pathway. What we can see is that it isn’t as if novelty is discrete, that you will find a new gene doing something entirely, radically novel in cell function — instead, they tweak extant networks of genes to refine the capabilities of the organism.

Origins of eumetazoan genes. (A) Pie chart showing the percentages of genes in the eumetazoan ancestors according to their origin: type I novelties with no homology to proteins in nonanimal outgroups (blue), type II novelties with novel animal domains paired with ancient domains (orange), type III novelties with new pairings of ancient domains (pink), and ancient genes (green). (B) A schematic representation of the FAK and Shc/Fyn pathways in integrin signaling. The proteins are color-coded to reflect their ancestry, as in (A). JNK, c-Jun N-terminal kinase.

And just what do the eumetazoan genes that fall into the Type I or completely novel category do? It’s no surprise there at all: that category is enriched for genes involved in signal transduction (like the example cartooned above), cell communication and adhesion, and a catch-all category of developmental processes, which includes genes involved in the emergence of the nervous system and in differentiation of mesoderm. Like I’ve been telling you all along, evolution is also telling us that development is important. It’s where the action has been in the evolution of multi-cellular animals.

I think it’s also where the interesting work of evo-devo lies. There are some competing complaints about evo-devo: that it is the domain of narrow proponents of the relative merits of changes in cis- and trans-regulatory elements, or that it is a colossal irrelevancy that offers no new principles to elucidate. I take a broader view. I think it’s true that, as this work shows, multicellular animals are the product of fundamental genetic structures that were almost entirely pioneered by our single-celled precursors—we are bacteria writ large and sloppy. We also possess an amazing degree of unity within the various metazoan lineages; when we examine a sea anemone and a human at the genetic, biochemical, and molecular level, it’s easy to get overwhelmed with the commonalities at our foundation. We’re all the same in so many ways!

But at the same time, an anemone and a person are obviously very different at the tissue and organismal level, and yet more different still from a yeast cell. We also do have real genetic novelties that may be a small fraction of the genome, but have been crucial in the evolution of form and function in multicellular organisms. To me, that’s what evo-devo is really all about: what are the clever little adjustments to the integration of networks of genes that allow the emergence of complex variations in phenotype from the genome? It doesn’t matter whether it’s an accumulation of changes in cis-regulatory elements, or whether it’s in plants or animals, it’s obvious that there is incredible potential in the complexity of interactions in genetic networks that we understand only poorly, and that represents a promising and powerful line of scientific questions. If you want to understand how the differences in the genes of Nematostella and Homo sapiens produce tiny tentacled marine predator upon microorganisms vs. a bipedal terrestrial philosopher, I think you’re going to have to study evo-devo.

Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, Terry A, Shapiro H, Lindquist E, Kapitonov VV, Jurka J, Genikhovich G, Grigoriev IV, Lucas SM, Steele RE, Finnerty JR, Technau U, Martindale MQ, Rokhsar DS. (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317(5834):86-94.


  1. #1 Torbjörn Larsson, OM
    July 10, 2007

    I was hoping that release of the draft would get a post on Pharyngula, but it turned out even better than I imagined. And it was fascinating to know that human genome nearly balances on intron gain/loss even at our humble evolutionary rate. I had no idea.

    Btw, that is up with our Y-chromosome not having conserved synteny at all. Psh, its repair system must be working overtime. Messy little bugger.

    OT, our precambrian ancestors had at least 80 % of our genome figured out, and a comparative genome size? This will freak the predarwinists, ehrm, I mean paleyists out.