Previous entries:

This post is part of a series exploring the evolution of a duplicated gene in the genus Drosophila. Links to the previous posts are above. Part 5 of this series (Examining the Outgroups) can be found below.

Examining the Outgroups

In the previous post I mentioned that the two outgroup species in our analysis, the mosquito Anopheles gambiae and the honeybee Apis mellifera, have only one copy of the aldolase gene. If that’s the case, then it’s likely that the gene was duplicated after the divergence of Drosophila from Anopheles. How can we confirm this? In this entry we will use the NCBI’s BLAST webpage to search for aldolase genes in the outgroup species.

The Basic Local Alignment Search Tool (BLAST) allows you to search a sequence against a database of other sequences. We’ll use BLAST to search the two aldolase genes from D. melanogaster against the Anopheles gambiae and Apis mellifera genomes. To do so, the BLAST program will translate the gene’s DNA sequence into a protein sequence. Each genome will also be translated into every possible protein it can encode. This is called TBLASTX, meaning a database of nucleotide sequences (each genome) will be translated and searched by a translated query (each D. melanogaster aldolase gene). The first thing we need to do is select TBLASTX from the main BLAST page:

Click to enlarge

If you do that, you’ll be taken to the TBLASTX page. We’ll start by pasting the sequence of the D. melanogaster Aldolase gene (available in this zip file) into the input box:

Click to enlarge

Next, we’ll select the database for TBLASTX. In the “Organism” box, enter “Anopheles gambiae”. As you enter the text, multiple options will appear:

Click to enlarge

Finally, click “BLAST” and your query will be processed. It may take a few minutes to complete the search, but eventually your result will appear. When it does, this is what you will see:

Click to enlarge

Each horizontal line indicates a sequence from A. gambiae that matches our query sequence. The coordinates of the match against the query sequence are given along the top, and the color of the line indicates how good the match is (red is excellent, purple is very good, green is good, blue is fair, and black is poor). The first matching sequence comes from the A. gambiae genome sequence, while the rest of the sequences are from cDNA clones. What are cDNA clones? Well, they’re basically sequences from mRNA, which means they aren’t from the genome project. There’s only a single hit to the gene from the A. gambiae genome sequence. That means there’s probably one copy of an aldolase gene in the A. gambiae genome.

We’ll also search the A. gambiae genome using the CG5432 gene from D. melanogaster. If we search the A. gambiae sequences, we get the following result:

Click to enlarge

There’s also only a single match to CG5432 in the A. gambiae genome, and it’s to the same region that matches Aldolase. This further supports our hypothesis that the mosquito has only one copy of an aldolase gene. Next, we can search the same D. melanogaster gene against the A. mellifera (honeybee) sequences. Here is how to format the search:

Click to enlarge

If you perform that search, you’ll see that there are two A. mellifera sequences that match Aldolase:

Click to enlarge

And if you search CG5432 against the A. mellifera database, you also get two sequences with excellent matches:

Click to enlarge

Both searches match the same two A. mellifera sequences (XM_623339.2 and XM_001121298.1), both of which are predicted homologs of the D. melanogaster Aldolase gene. We can download the coding sequences (CDS) for each of those mRNAs from Genbank. To do so, we’ll click on the line in the BLAST result corresponding to the gene. That will take you to another area of the webpage containing an alignment of the D. melanogaster sequence with that from A. mellifera. If you click on the name of the A. mellifera above the alignment, you’ll be taken to the Genbank page for that particular sequence. Alternatively, you can click on the name of the sequence above, which I’ve included with a link to the Genbank page.

Click to enlarge

That’s a portion of the Genbank page for XM_623339.1. To ensure you’re downloading the CDS, click on the link to the CDS on the Genbank page (shown above). Then, select “FASTA” from the “Display” dropdown:

Click to enlarge

You can either copy and paste the FASTA sequence into a text editor or select “Send to: File” and save the FASTA sequence to your hard drive. Now, do the same for XM_001121298.1. If you’re having trouble downloading the CDS for these two genes, I have uploaded the FASTA files. You can dowload them from the following links: XM_623339, XM001121298.

Why is there one match to the aldolase genes in the A. gambiae database, but there are two matches in the A. mellifera database? We’ll try and figure that out in a subsequent post. But, before we do that, we’ll take a look at the evolutionary relationship of Drosophila, Anopholes (mosquito), and Apis (honeybee). That will allow us to formulate a hypothesis about why A. mellifera has two matches to aldolase, while A. gambiae has only one.