evolgen

Previous entries:

This post is part of a series exploring the evolution of a duplicated gene in the genus Drosophila. Links to the previous posts are above. Part 4 of this series (Obtaining More Sequences) can be found below.

Obtaining More Sequences

Last time we downloaded sequences for both aldolase genes from Drosophila melanogaster (see here). But, if we want to study the evolution of these genes, we need sequences from a few more species. There are now complete genome sequences available for 12 Drosophila species, but the annotations for those genomes aren’t easily accessible for most people. In this entry we’ll take advantage of the University of California, Santa Cruz (UCSC) Genome Browser to obtain some preliminary annotations of the two genes from a few species.

We’ll start by visiting UCSC Genome Browser Gateway, and selecting one of the Drosophila genomes from the dropdown menu (see below). You can do this by selecting “Insect” from the “clade” menu and then “D. simulans” from the “genome” menu. Finally, under “position or search term” enter “aldolase” and click the “submit” button.

i-15d55c7b956aad7a2eb259d94f0c77e8-ucsc_gateway_sm.gif
Click to enlarge

The search results contain a list of a ton of links, most of which will take you to the same place. If you click on the first link, you’ll be taken to the genome browser for the Aldolase gene region in the D. simulans genome. The browser is cluttered up with information we don’t care about, so we’ll adjust our preferences:

i-8db7b7e1ce180fc8095b7ae33a3412db-ucsc_browser_settings_sm.gif
Click to enlarge

Under “Other RefSeq” select “hide” (this will get rid of the alignments with genes from every species in the UCSC database) and click “refresh”. That makes the browser look a lot cleaner:

i-b624a11cc8cadbe3fe887b8ed620e61e-ucsc_browser_1_sm.gif
Click to enlarge

To ensure we’re looking at the correct region, click where it says “D. mel. proteins” in the browser (see above). That will allow us to see which D. melanogaster genes are homologous to the D. simulans region we’re looking at. When we expand the D. melanogaster annotations, we see that this region does indeed match the D. melanogaster Aldolase gene:

i-fa0e26cafcbe51abdde13f143b3652a5-ucsc_browser_2_sm.gif
Click to enlarge

In addition to the D. melanogaster Aldolase gene, this region also matches CG5432, the other aldolase gene in the Drosophila gene family. This is all well and good, but we want to download the sequence of the gene. To do that, we’ll take advantage of the gene prediction from Genscan (Burge and Karlin 1997). This program searches for genes by identifying open reading frames and intron-exon splice sites. To download the sequence of the gene predicted by Genscan click on the brown predicted gene. You’ll be taken to a new webpage where you can download the predicted mRNA.

The same approach can be used to download the D. simulans copy of CG5432. Start by searching for CG5432 in the D. simulans Genome Browser. If you’re having trouble, this is the D. simulans browser page for CG5432. And, if you follow the directions above, you’ll be able to download the predicted sequence for the D. simulans copy of CG5432.

You can download the two genes from the other species by following the directions above. Also, when we downloaded the D. melanogaster genes, we retrieved three different versions of Aldolase (in addition to the one copy of CG5432). Because the annotation of the genes from the other species isn’t as good as that for D. melanogaster, we’ll only be getting one version of Aldolase from each of those species. It is possible, if we’re willing to go through the trouble, to annotate the alternatively spliced forms of Aldolase from the other species, but I’m sacrificing thoroughness for the sake of accessibility.

Additionally, the D. pseudoobscura annotation is further along than the other ten non-D. melanogaster genomes (Richards et al. 2005), and there is a track in the UCSC Genome Browser for D. pseudoobscura containing the predicted genes for Aldolase and CG5432. Unfortunately, the UCSC Genome Browser does not include D. willistoni, which means we won’t have sequences from that species. And, while we’re getting the sequences of these two genes from the Drosophila species, we may as well download the genes from the the mosquito Anopheles gambiae and the honey bee Apis mellifera. These will be useful as outgroups in phylogenetic analysis. If you do this yourself, you’ll find that there is only one copy of an aldolase gene in each of those genomes.

To save everyone the trouble of clicking and browsing, I’ve provided the FASTA files for all the species here in a Zip file. To access the FASTA files, download the *.zip file, unzip it, and save the *.fas files to your hard drive. In the next entry, we’ll begin using the sequences of these genes to study the evolution of the aldolase gene family in insects.


References Cited:

Burge C and Karlin S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78-94.

Richards et al. 2005. Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res. 15: 1-18.