Sorting the Pancakes that Make up a Genome

Genome rearrangements are fast becoming one of the most interesting aspects of comparative genomics (I may be slightly biased in my perspective). We have known for quite some time that genomes of different species (and even within species) differ by inversions of their chromosomes (this was first studied in Drosophila). In fact, some of the early work on the evolutionary relationships of species was done using chromosomal rearrangements. Additionally, there's a whole lot of important biological implications of rearrangements, including speciation, human disease, and the function of genes contained within inversions.

Now that we have whole genome sequences from multiple closely related species, we can identify differences in gene order. Some species are so closely related that there are not many differences (eg, humans and chimps). Other species differ by many rearrangements (eg, distantly related drosophilids). For the closely related species, it's pretty easy to reconstruct the rearrangement events. For the distantly related species, however, we need complex algorithms to solve these problems -- problems that exist at the intersection of biology and computer science.

The problem of reconstructing the inversion events that differentiate two genomes is analogous to sorting a stack of pancakes (each with a different diameter) from largest to smallest using only a spatula to flip over the pancakes at the top of the stack -- an analogy presented in this article from American Scientist (via 3QD). The article itself is an interesting read, tracing the history of studying genome rearrangements from the early work on Drosophila to the early computational work on pancake stacking and flipping (some of which was done by Harvard dropout William Gates) to the merger of the computer science algorithms with the biology.

From a naive reading of the article, it would appear that many of the big questions have been solved. This is far from the truth, as the algorithms described don't work very well for species that differ by a boatload of inversions. Sure, they work alright for human and mouse -- these guys love to test out their work in mammalian genomes -- but they fall flat when it comes to dealing with Drosophila, ironic considering that's where this problem first presented itself. The problem with these algorithms lies in how they approach the problem at hand; they try to solve it by looking at the entire sequence of genes at once, shuffling them around until the end up with the same gene order in both species. This approach does not work when the amount of inversion events gets too large (and breakpoints get reused often).

If the American Scientist article had been published a few months from now, it would have described analysis done on the 12 Drosophila genomes that approaches the problem of genome rearrangement from a different perspective. I can't say much else about this unpublished research, but it will give us a new way to study genome rearrangements. Additionally, this new approach captures the biological realities of the rearrangements in a much more accurate light than the work described in the article I've linked.

More like this