Pufferfish and ancestral genomes



The fugu is a famous fish, at least as a Japanese sushi dish containing a potentially lethal neurotoxin that was featured on an episode of The Simpsons. Fugu is a member of the pufferfish group, which have another claim to fame: an extremely small genome, roughly a tenth the size of that of other vertebrates. The genome of several species of pufferfish is being sequenced, and the latest issue of Nature announces the completion of a draft sequence for the green spotted pufferfish, Tetraodon nigroviridis, a small freshwater species.

Tetraodon has about the same number of genes as we do, 20,000-25,000, but they are contained in a total genome length of 340Mb vs. our huge 3.1Gb. One major difference is that in Tetraodon, transposable elements are rare: they have 73 types, present in less than 4000 copies, but humans have about 20 different types present in millions of copies. Transposable elements may be reverse transcriptases that blindly copy RNA sequences back into the DNA (called LINES) or shorter sequences that are processed by LINES, called SINES. These really are parasitic bits of selfish DNA, and somehow, pufferfish seem to be largely free of them.

One of the interesting things one can do with a pair of genome sequences is to start mapping synteny. Synteny represents the preservation of small regions of order within a chromosome; while the overall organization may have been scrambled by millions of years of chromosome breaks and fusions and duplications and deletions, we can still identify smaller blocks that maintain the same series of genes within them. For example, if we look on a chromosome of one organism and we see the series of genes A-B-C-D-E-F, and we look in another organism and find a chromosome with the genes W-X-C-D-E-Y-Z, we can see that the C-D-E chunk can be mapped directly to one region of that second organism's chromosome.

A way to diagram this is to color code all the syntenic regions from each chromosome in one organism, and see how those regions are distributed in a second organism. For instance, in the first diagram below, imagine that human chromosome 15 has been colored yellow (the key along the bottom of the diagram tells you how they've been color coded.) Then, we take each synteny with chromosome 15 in the Tetraodon genome and color it yellow; you can think of it as if we've taken human chromosome 15, and all the other chromosomes, and broken them apart and reassembled them into the Tetraodon genome, and the colors allow us to see where each fragment came from. For instance, you can see that large pieces of human chromosome 15 are found in Tetraodon chromosomes 5 and 13.

You can also do this in the other direction, and take each Tetraodon chromosome, color code them, break them apart, and reassemble them into the order they would be in in the human genome, as in the second diagram.


Synteny maps. a, For each Tetraodon chromosome, coloured segments represent conserved synteny with a particular human chromosome. Synteny is defined as groups of two or more Tetraodon genes that possess an orthologue on the same human chromosome, irrespective of orientation or order. Tetraodon chromosomes are not in descending order by size because of unequal sequence coverage. The entire map includes 5,518 orthologues in 900 syntenic segments. b, On the human genome the map is composed of 905 syntenic segments.

What you are seeing in the scattered colors in these diagrams is a rendering of the history of chromosome reorganization that occurred during evolution. Hundreds of millions of years of juggling, and the order is still not completely randomized—there are still recognizably preserved blocks of local structure.

Of course, don't be misled by the synteny diagrams into thinking that what happened in evolution was the reorganization of the human genomic organization into the Tetraodon pattern (or vice versa). There was a common ancestor with some arrangement of genes on its chromosomes, and those chromosomes got broken apart and juggled around in our history, and those chromosomes were independently scrambled in the Tetraodon lineage.

Here's something else we can learn from synteny. Looking more closely at the map, below, reveals a curious pattern: some of the Tetraodon chromosomes can be mapped in an alternating or interleaved arrangement on the human chromosomes.

Duplicate mapping of human chromosomes reveals a whole-genome duplication in Tetraodon. Blocks of synteny along human chromosomes map to two (or three) Tetraodon chromosomes in an interleaving pattern. Small boxes represent groups of syntenic orthologous genes enclosed in larger boxes that define the boundaries of 110 DCS blocks. Black circles indicate human centromeres. A region of human chromosomes Xq and 16q are shown in detail with individual Tetraodon orthologous genes depicted on either side.

For instance, look at human chromosome 16 (Hsa16). Blocks of genes from Tetraodon chromosome 5 (Tni5) and Tetraodon chromosome 13 (Tni13) are both found in a larger syntenic region here…and this defies probability. If chromosome reorganizations were random, we shouldn't be seeing only Tni5 and Tni13 alternating exclusively here, and we shouldn't be seeing these other associations all over the place. You can also see Tni5 and Tni13 alternating on Hsa15, and Tni1 and Tni7 alternating on HsaX.

Why are Tni5 and Tni13 found together, and Tni1 and Tni7, and Tni2 and Tni3, and so forth? The simplest answer is that there was a whole genome duplication in the Tetraodon lineage. In the last common ancestor of humans and Tetraodon, there was a single chromosome that mapped onto that region of Hsa16. After Tetraodon diverged, that chromosome was duplicated, forming Tni5 and Tni13, and over time, many of the duplicated genes were secondarily lost, but which copy was lost, whether the one on Tni5 or the one on Tni13, was random. The regions of doubly-conserved synteny (DCS), where we've got pairs of Tetraodon chromosomes clustering together in lockstep with regions of the human chromosome, represent ghosts of a single ancestral chromosome. In this diagram, those hypothetical ancestral chromosomes have been outlined and colored in the pastel colors surrounding the interleaved regions, and given the names AncA, AncB, AncC, etc. There are 110 of these DCS blocks scattered through the genome, and they can be grouped into 12 ancestral chromosomes.

And here you are, those ancestral syntenic fragments can then be reordered into a their simplest arrangement, and we have an ancestral genome.

Composition of the ancestral osteichthyan genome. The 110 DCS blocks identifiedon the human genome are grouped according to their composition in terms of Tetraodon chromosomes, thus delineating 12 ancestral chromosomes containing 90 DCS blocks. The order of DCSs within an ancestral chromosome is arbitrary. The 20 blocks denoted by the letters U, V, W and Z could not be assigned to an ancestral chromosome because each has a unique composition, probably due to rearrangements in the human or Tetraodon genome.

I don't know about you, but I find this use of genomic evidence and logic to identify the long-lost chromosomes of a 350 million-year-old organism to be extremely cool. Hella cool. Kick-ass, far-out, mega-cool.

Jebus, but I am so happy to be a member of the reality-based community.

Anyway, comparative genomics now lets us put together a comprehensible story of the evolution of vertebrates. In the diagram below, we start with an ancestral bony fish with 12 chromosomes, and then split into two lineages that lead to us tetrapods on the left, and modern fish on the right.

Proposed model for the distribution of ancestral chromosome segments in the human and the Tetraodon genomes. The composition of Tetraodon chromosomes is based on their duplication pattern, whereas the composition of human chromosomes is based on the distribution of orthologues of Tetraodon genes. A vertical line in Tetraodon chromosomes denotes regions where sequence has not yet been assigned. With 90 blocks in human compared with 44 in Tetraodon, the complexity of the mosaic of ancestral segments in human chromosomes underlines the higher frequency of rearrangements to which they were submitted during the same evolutionary period.

Our lineage was marked by a period of intense amplification of transposable elements, LINES and SINES and repeated junk that copied itself over and over, leading to our current grossly bloated genome, containing 20-25 thousand genes swimming in 3 Gb of mostly useless dead DNA. Meanwhile, the fish went through a whole genome duplication (WGD), followed by a thorough pruning back of many of the excess copies, leaving them with a similar 20-25 thousand genes. In the Tetraodon lineage, at least, there was no amplification of junk, so their protein-coding genes reside in a lean, trim 340 Mb.

Comparative genomics is a powerful tool that is going to be telling us much, much more about our evolutionary history.

…the remarkable preservation of the Tetraodon genome after WGD makes it possible to infer the history of vertebrate chromosome evolution. The model suggests that the ancestral vertebrate genome was comprised of 12 chromosomes, was compact, and contained not significantly fewer genes than modern vertebrates(in as much as the WGD and subsequent massive gene loss resulted in only a tiny fraction of duplicate genes being retained).

The explosion of transposable elements in the mammalian lineage,subsequent to divergence from the teleost lineage, may have provided the conditions for increased interchromosomal rearrangements in mammals; in contrast, the Tetraodon genome underwent much less interchromosomal rearrangement. With the availability of additional vertebrate genomes (dog, marsupial, chicken, medaka, zebrafish and frog are underway), it will be possible to explore intermediate nodes such as the last common ancestor of amniotes, of sarcopterygians and of actinopterygians, and to gain an increasingly clearer pictureof the early vertebrate ancestor. Because the early vertebrate genome is 'closer' to current invertebrates, this should in turn facilitate comparison between vertebrate and invertebrate evolution.

Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, Nicaud S, Jaffe D, Fisher S, Lutfalla G, Dossat C, Segurens B, Dasilva C, Salanoubat M, Levy M, Boudet N, Castellano S, Anthouard V, Jubin C, Castelli V, Katinka M, Vacherie B, Biemont C, Skalli Z, Cattolico L, Poulain J, De Berardinis V, Cruaud C, Duprat S, Brottier P, Coutanceau JP, Gouzy J, Parra G, Lardier G, Chapple C, McKernan KJ, McEwan P, Bosak S, Kellis M, Volff JN, Guigo R, Zody MC, Mesirov J, Lindblad-Toh K, Birren B, Nusbaum C, Kahn D, Robinson-Rechavi M, Laudet V, Schachter V, Quetier F, Saurin W, Scarpelli C, Wincker P, Lander ES, Weissenbach J, Roest Crollius H (2004) Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431:946-957.

More like this

This is an amphioxus, a cephalochordate or lancelet. It's been stained to increase contrast; in life, they are pale, almost transparent. It looks rather fish-like, or rather, much like a larval fish, with it's repeated blocks of muscle arranged along a stream-lined form, and a notochord, or…
There's a post up at Pharyngula describing the concept of synteny in comparative genomics (Basics: Synteny). The definition given by PZ Myers will sound pretty familiar to those of you who have read some of the genomics literature. The problem: it's not quite correct. It's actually the definition…
We now have a draft of the sea anemone genome, and it is revealing tantalizing details of metazoan evolution. The subject is the starlet anemone, Nematostella vectensis, a beautiful little animal that is also an up-and-coming star of developmental biology research. (click for larger image)…
Let's play the most boring card game in the universe! Here are the rules. We start with a fully sorted deck of 52 cards, and we deal out four hands. We don't deal in the ordinary way, either: we give the top 13 cards to the first player, then the next 13 to the second, and so forth. (We could also…

I think the more important issue that this paper raises is:

How many authours is too many authours for a study?

By Miguelito (not verified) on 08 Jun 2006 #permalink

I had the same sense of megacool when I did my first molecular phylogenetic tree, and realized I was looking at the genetic sequence of a long dead organism, one whose physical characteristics I could only guess at, inferred only from the genes of its offspring.

There's a freshwater species of pufferfish? I'll never feel safe in the bathtub again.

PZ, I've got a bone to pick with you about terminology. You're probably just using the same words that the authors used, but they're vertebrate genomics folks, and they don't know jack.

Synteny represents the preservation of small regions of order within a chromosome...

Actually, synteny refers to blocks of genes found on the same chromosome. It is independent of gene order/orientation. The vertebrate comparative genomics community has hijacked the term to refer to blocks of genes with conserved order. They never learned classical genetics, and I will never forgive them.

PZ: Wonderful research, wonderful graphics, wonderful commentary.

With respect to the difference in the amount of transposable elements within the respective lineages, is it plausible to infer that Tetraodon's lineage has far less "junk" due to the relative homogeneity of the environments experienced by that lineage? To put it another way, is it reasonable to expect that this difference is the product of selection?


By Scott Hatfield (not verified) on 08 Jun 2006 #permalink

Miguelito: If you think that is a lot of authours you should take a look at a paper from one of the large particle physics experiments like D0, or CDF.

Thanks for the link, RPM.

My M.Sc. (just completed) was primarily a comparison of linkage relationships between markers on both the Guppy (Poecilia reticulata) and Swordtail (Xiphophorus spp) linkage maps. The word "synteny" got used and abused a great deal. I got hammered on it more than once in lab presentations and committee meetings. Even though I'm a vertebrate-person (I hesitate to call myself a genomicist), I HAD to learn that particular word.

By TheBrummell (not verified) on 08 Jun 2006 #permalink

That was really fantastic, and awe-inspiring.

What would need to be done to guess at the structure of an ancestor from its inferred DNA? Is it, in theory, possible, or is there more information that can't really be inferred using this method that is required to determine that sort of thing. I'm afraid I have no biology knowledge. :-(

S. Hatfield wrote: With respect to the difference in the amount of transposable elements within the respective lineages, is it plausible to infer that Tetraodon's lineage has far less "junk" due to the relative homogeneity of the environments experienced by that lineage?

No. If you mean simply that they are aquatic, several aquatic vertebrates have loads of "junk". And pufferfish do not live in an any more homogeneous environment than other fish, either.

Absolutely fascinating, and a really helpful explanation!

Erm, err .....

No "Jurassic Park" stuff, but....

Is it ever, or even within the next 20 years (say) going to be possible to really reconstruct long-extinct organisms, by recreating the DNA sequence, and then "loading" it into a cell, or cells, and letting the organism develop?

By G. Tingey (not verified) on 08 Jun 2006 #permalink

To put it another way, is it reasonable to expect that this difference is the product of selection?

If it is due to selection, we would expect the lineage leading to pufferfish to have a large effective population size. If this does not make sense to you, google "nearly neutral theory" and start reading. Or hop over to evolgen or gene expression and search for posts about the nearly neutral theory.

Is it ever, or even within the next 20 years (say) going to be possible to really reconstruct long-extinct organisms, by recreating the DNA sequence, and then "loading" it into a cell, or cells, and letting the organism develop?

Researchers studying ancestral reconstruction often get "Jurasic Park" questions. Even if they were able to reconstruct the order of genes and the sequence of all non-coding and coding DNA, they still probably would not be able to implant that nucleus into the egg of an extant organism and develop the ancestor. As PZ can tell you, the proteins and RNAs a mother puts into her eggs (and, as we're finding out, the stuff a father puts into the head of a sperm) are very important for development. Figuring out how to create the proper ancestral egg environment is a problem that I don't think any developmental biologist will be able to solve. Yeah, that's a challenge.

To rephrase RPM more simply, an organism, or even a cell, is not completely defined by its DNA!

There are persistent structures and patterns, and even small datastores, scattered around individual cells throughout the body. Many of these are created and/or copied more-or-less independently of chromosomal DNA, or even cell division, but their functions are deeply interwined with the DNA-based business of the cell. Thus the extragenetic material not only can, but must, evolve in parallel with the DNA. Unfortunately, it's much harder to reconstruct, and we'd need it for a proper species recreation.

By David Harmon (not verified) on 09 Jun 2006 #permalink

Tetraodon nigroviridis?
The Polka-dot Puffer of Tropical Africa?
Have they sequenced the Mbu Puffer, Tetraodon mbu?

T. nigroviridis is South Asian, if you meant him by "Polka-dot puffer".

I don't think T. mbu has been sequenced, and obviously Carinotetraodon travancoricus should be sequenced next, since they are much cuter and more interesting. ;)

The authors of that paper mentioned that sequencing additional vertebrate genomes will be very helpful in this work. And given what's sequenced so far, we may now be able to get a clearer picture. I've checked sites like PubMed, The Genomes Online Database, and The International Sequencing Consortium. An especially useful one is PubMed's eukaryotic-genomes site, which I've sorted by "Group" as primary and "Subgroup" as secondary.

So far, several mammalian ones have been sequenced, like the rat, mouse, and dog genomes, making it possible to recov ancestral-mammalian genome arrangement. However, the only non-mammalian amniote (and non-mammalian tetrapod!) sequenced so far has been the chicken.

One other fish has been sequenced, the medaka (another teleost), zebrafish, and a ray (Leucoraja) are on the way. We already have sequences of a sea squirt (Ciona) and a hemichordate worm and a sea urchin are on the way. These invertebrate-deuterostome sequences should help resolve the question of early-vertebrate genome duplications: a common hypothesis is 4x for ancestral vertebrates and 2x for ancestral teleosts (total of 8x!).

Sequencing has been even more spotty in the rest of Metazoa, there are about as many Drosophila species sequenced or being sequenced as other insect species combined.

And outside the animal kingdom, most of the eukaryote sequencing has been done on fungi; "spotty" is an understatement.

It may be possible to reconstruct the ancestral eukaryote genome, but I think that doing so will require reconstructing a lot of intermediate genomes, like for ancestral plants, ancestral fungi, etc.

By Loren Petrich (not verified) on 09 Jun 2006 #permalink

One other fish has been sequenced, the medaka (another teleost), zebrafish, and a ray (Leucoraja) are on the way.

The medaka seems to already have reached the assembly stage, along with Takifugu and Tetraodon, so three fishes have been sequenced.

There used to be a nice page at TIGR with pictures of the organisms linking to the various genome projects, but they changed the site layout, so I can't find it anymore.

First, I concede on Tetraodon. And yes, that TIGR page still exists, though under a different name: The Genome News Network List of Sequenced Organisms. And its links A - B, C - G, H - N, O - S, T - Z still contain those organism pictures.

And getting to overall eukaryote phylogeny, Patrick Keeling has a nice diagram showing a recent view of it. It shows five major clades:

Opisthokonts (one flagellum in the rear) -- includes animals, choanoflagellates (collar flagellates), and fungi. Amoebas branched off early from the rest of them. This group has had the largest fraction of the genome-sequencing effort.

Plants, in a very broad sense, include green algae and land plants, red algae, and glaucophytes. A few have been sequenced, like Arabidopsis.

Chromalveolates, chromists + alveolates (diatoms, brown algae, ciliates, apicomplexans, etc.). A bit more sequencing effort.

Excavates (euglenids, kinetoplastids, etc.). Hardly any sequencing effort.

Cercozoa (foraminiferans and radiolarians). No sequencing effort that I've been able to find.

To summarize, we may be able to get a good picture of the animal-fungus ancestor, but not many of the other ancestors.

By Loren Petrich (not verified) on 10 Jun 2006 #permalink

I just want to point out that Jaillon et omnes keep confusing Vertebrata and Osteichthyes. No, they have not reconstructed the ancestral vertebrate karyotype. They have reconstructed the ancestral "bony fish" karyotype.

Molecular phylogenetics papers are commonly as careless as that about nomenclature.

By David Marjanović (not verified) on 10 Jul 2007 #permalink

I just want to point out that Jaillon et omnes keep confusing Vertebrata and Osteichthyes. No, they have not reconstructed the ancestral vertebrate karyotype. They have reconstructed the ancestral "bony fish" karyotype.

Molecular phylogenetics papers are commonly as careless as that about nomenclature.

By David Marjanović (not verified) on 10 Jul 2007 #permalink