Digital Biology Friday: Those BLASTed results!

Last week, we embarked on an adventure with BLAST.

BLAST, short for Basic Alignment Search Tool, is a collection of programs, written by scientists at the NCBI (1) that are used to compare sequences of proteins or nucleic acids. BLAST is used in multiple ways, but last week my challenge to you, dear readers, was to a pick a sequence, any sequence, from a set of 16 unknown sequences and use BLAST to identify that sequence.

This week, we'll examine the results.

I did the experiment, too, with a completely different unknown sequence that's pasted below. This sequence is not part of the data set that I put at the Geospiza Education site.

>unknown_seq

ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCC
TTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATC
AGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCC
TTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCT
GCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCG
CCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAA
GCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCAT
GAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATC

Looking at the letters, of course, doesn't really help me at all. All I see are A's, G's, C's, and T's.

To solve the problem and identify the sequence, I have to compare my unidentified sequence to a collection of sequences of that have already been identified by other people and see if my sequence matches any sequences that are already known.

First, I copy my unknown sequence, then I follow the steps that are outlined in the BLAST for Beginners tutorial at the Geospiza Education web site. In the tutorial, I click the bright green arrows to move from page to page and see what to do.

My favorite way to use the tutorials is to open two web browser windows and resize the windows so they fit side by side on a computer screen. Then, I go through the tutorial in one window and do the steps myself in the other window.

(FYI: I started making these tutorials because I thought I would go crazy if I had to teach classes by spending fifty minutes saying "Click here" then "Click here" then "Click here".)

Eventually, I get to a page with results.

BLAST has looked into it's crystal ball and we get:

Hmm, I see......

A graph with lots of red lines.

i-5ef6fdcb72668ae0f9694cbe2d3f288a-red_small.gif

What does this mean?

Click the graph to see a larger version with some explanations.

To put it simply, the graph shows me that at least one hundred sequences in GenBank match my entire sequence.

If I look farther down the page, I come to more curious results.

i-e5b047d6870eac200cb38bd898be3cb1-results_small.gif
Click the image to see a larger version.

To summarize what I see, I have a list of fifty results (only some of them are shown in this image). All the results have a score of 833 and an E. value of 0.0, but the descriptions look like different things. C'mon what do Dengue virus, SIV, and E. coli have in common?

(at least if we don't read carefully, wink, wink, nudge, nudge)

Strange....

Why would my sequence match (at least) 50 different sequences in the nucleotide database?

Can you solve the mystery?

Copy the sequence at the beginning of this post and give it at try. Feel free to submit comments with your answer.

Or wait until next week, for more of the story.

References:

1. Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

technorati tags: , ,

Copyright Geospiza, Inc.

More like this

It's well understood in science education that students are more engaged when they work on problems that matter.  Right now, Zika virus matters.  Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I teach a…
By now, many of you have probably seen the the new BLAST web interface at the NCBI. There are many good things that I can say about it, but there are a few others that caught me by surprise during my last couple of classes. tags: blast, BLAST tutorial, science education Because of these changes,…
In which we search for Elvis, using blastp, and find out how old we would have to be to see Elvis in a Las Vegas club. Introduction Once you're acquainted with proteins, amino acids, and the kinds of bonds that hold proteins together, we can talk about using this information to evaluate the…
Last year I wrote about an experiment where I compared a human mitochondrial DNA sequence to primate sequences in the GenBank. Since I wanted to know about the differences between humans, gorillas, and chimps, I used the Entrez query 'Great Apes' to limit my search to a set of sequences in the…

Hmm, I'm reaching here ... could this sequence be the origin of replication for plasmids as well as some viruses? Curious.

It is a beta-lactamase, an enzyme related to antibiotic resistance. BLAST it against a protein DB, and/or run it against PFAM

Coleen,
Good guess. It is a gene that's found in many plasmids.

Diego,
You are right but you're solving the problem the hard way. I'll show you an easier way to find the answer next week.