Digital Biology Friday: Animal Mitochondria and Evolution Revisited

Last year I wrote about an experiment where I compared a human mitochondrial DNA sequence to primate sequences in the GenBank. Since I wanted to know about the differences between humans, gorillas, and chimps, I used the Entrez query 'Great Apes' to limit my search to a set of sequences in the PopSet database that contained gorillas, bonobos, chimps, and human DNA.

A week ago, I tried to repeat this experiment and...

It didn't work.

All I saw were human mitochondrial sequences.  I know the other sequences match, but I didn't see them since there are so many human sequences that match better.

This is, I think, one of the things that makes bioinformatics activities a challenge for teachers. The ever-changing nature of public databases makes it difficult to repeat experiments from year to year, not impossible mind you, but difficult.

So, I decided it would be a good time to update this activity and modify it a bit, so that we might be able to get consistent results from year to year.

The overview: We are going to use a human mitochondrial genome sequence as a query and compare it a set of mitochondrial genome sequences from other animals. Our results from comparing mitochondrial DNA sequences should be similar to the results that we would get if we were comparing bones, teeth, or other body parts.

Before you begin: Go to the NCBI ( and make yourself an account.  Having an account is a good thing because you can save your search strategies and even review your past search results (for 36 hours).

The instructions:

  1. Go to the NCBI (
  2. Choose BLAST from the menu bar.  (If you're getting lost with trying to find items on the pages, I have a tutorial that shows where to find them.)
  3. Choose nucleotide blast from the selections on the BLAST home page.
  4. Copy NC_001807 and paste it in the large empty box towards the top of the page.

NC_001807 is the accession number for a reference sequence for a mitochondrial chromosome from a human.

  1. Change the database to Reference genomic sequences.
  2. Copy this set of accession numbers:

NC_001807, NC_001644, NC_001643, NC_001645, NC_002083, NC_002082, NC_005943, NC_004025, NC_006853, NC_007704, NC_001941,NC_009849, NC_000891, NC_001779, NC_001788, NC_001640, NC_005044, NC_002333, NC_002008, NC_001700, NC_001573

Note: These accession numbers correspond to mitochondrial genome sequences from several different organisms. The first accession number is your human query sequence. I included that one as a positive control.

  1. Paste the list of accession numbers in the Entrez Query text box in the BLAST form.
  1. Click BLAST.
  2. Wait patiently.

Your results will show how these other mitochondrial genomes differ from the one human mitochondrial sequence and how different they are.

Saving your strategy

We're going to interpret the results in multiple steps, so you may wish to save this search strategy and revisit it in later weeks. To do this, make sure you're logging into your account at the NCBI.  Then, when you view your Recent Results, click the Save link.  This won't save your results, but it will save your query and the list of Accession numbers that you entered in the Entrez box.  That way, you can repeat the search easily at a later time.

Interpreting your results, part I.

Don't worry about all your blast results just yet.  First, let's think a bit more about the experiment and the materials that we used.

A.  What were those sequences anyway?

The first step is to determine what we looked at.

I'm aware that you might not recognize all the animals by their scientific names.  I sure don't. Names like Ornithorhynchus anatinus just leave me scratching my head.

Still, the names are important and they help us identify organisms with more precision.  In biology, we give all creatures two names that correspond to their genus and their species, respectively.  Now, those names, unfortunately are usually quite different from the names we would use in every day life, but we can still find out what they mean.

B.  How do we find out what the names mean?

For each sequence, find the accession number in the table and click it to see the GenBank record for that sequence.  If we're lucky, the record will contain the common name of the creature that served as the source of the DNA. 

Here is an example:

You should be able to find the common names for all of the organisms that were represented in that list of mitochondria.  But, in the future, if you're looking at something else and you can't find the answer on this page, you might find be able to find it by clicking the link next to the word ORGANISM and looking in the Taxonomy database. 


C.  Draw your own diagram

Once you figure out what you looked at, draw a diagram for yourself to show which of these creatures are the most alike.  Don't' worry about making a fancy tree or other kind of graph, we'll  do that a bit later.  For now, just group the creatures together that you think should be most similar.



More like this

We've had a good time in the past few last weeks, identifying unknown sequences and learning our way around a GenBank nucleotide record. To some people, it seems that this is all there is to doing digital biology. They would, of course, be wrong. We can do much, much more than identifying DNA…
Previous entries: Part 1 - Introduction Part 2 - The Backstory Part 3 - Obtaining Sequences Part 4 - Obtaining More Sequences This post is part of a series exploring the evolution of a duplicated gene in the genus Drosophila. Links to the previous posts are above. Part 5 of this series (Examining…
As many of you know, I'm a big fan of do-it-yourself biology. Digital biology, the field that I write about, is particularly well-suited to this kind of fun and exploration. Last week, I wrote some instructions for making a phylogenetic tree from mitochondrial genomes. This week, we'll continue…
In which we identify unknown human proteins. Yesterday, I wrote about using the BLOSUM 62 matrix to calculate a score for matches between two proteins. Those scores give us a good start on understanding how blastp determines whether two sequences are matching by chance or because they're more…