Seed Media Group

Discovering Biology in a Digital World

My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.

Profile

Sandra Porter I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Geospiza Education).

Search this blog

Learn about DNA with molecular models

Exploring DNA Structure


Subscribe to Geospiza Education News


e-mail digitalbio at gmail.com


DigitalBio Favorites

Molecular Momentos


Recent Posts

Recent Comments

Archives

Categories

Rotating Blogroll

Science Education Groups

Science Blogs School Fundraiser



Keep up to date

Awards

Red Orbit

Digital Bio at Blogged


Add Digital Bio to your Technorati Favorites!

Interesting places

  • xkcd
  • The Tangled Bank
    MicrobeWorld Radio

    « If you build it, will they come? | Main | Digital Biology Friday: What was that gene anyway? »

    Digital Biology Friday: Those BLASTed results!

    Category: BioinformaticsBiotechnologyDigital Biology FridaysScience educationsequence analysisweb resources
    Posted on: July 14, 2006 11:35 AM, by Sandra Porter

    Last week, we embarked on an adventure with BLAST.

    BLAST, short for Basic Alignment Search Tool, is a collection of programs, written by scientists at the NCBI (1) that are used to compare sequences of proteins or nucleic acids. BLAST is used in multiple ways, but last week my challenge to you, dear readers, was to a pick a sequence, any sequence, from a set of 16 unknown sequences and use BLAST to identify that sequence.

    This week, we'll examine the results.

    I did the experiment, too, with a completely different unknown sequence that's pasted below. This sequence is not part of the data set that I put at the Geospiza Education site.

    >unknown_seq
    ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCC TTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATC AGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCC TTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCT GCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCG CCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAA GCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCAT GAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATC

    Looking at the letters, of course, doesn't really help me at all. All I see are A's, G's, C's, and T's.

    To solve the problem and identify the sequence, I have to compare my unidentified sequence to a collection of sequences of that have already been identified by other people and see if my sequence matches any sequences that are already known.

    First, I copy my unknown sequence, then I follow the steps that are outlined in the BLAST for Beginners tutorial at the Geospiza Education web site. In the tutorial, I click the bright green arrows to move from page to page and see what to do.

    My favorite way to use the tutorials is to open two web browser windows and resize the windows so they fit side by side on a computer screen. Then, I go through the tutorial in one window and do the steps myself in the other window.

    (FYI: I started making these tutorials because I thought I would go crazy if I had to teach classes by spending fifty minutes saying "Click here" then "Click here" then "Click here".)

    Eventually, I get to a page with results.

    BLAST has looked into it's crystal ball and we get:


    Hmm, I see......


    A graph with lots of red lines.

    red_small.gif

    What does this mean?

    Click the graph to see a larger version with some explanations.

    To put it simply, the graph shows me that at least one hundred sequences in GenBank match my entire sequence.

    If I look farther down the page, I come to more curious results.


    results_small.gif
    Click the image to see a larger version.

    To summarize what I see, I have a list of fifty results (only some of them are shown in this image). All the results have a score of 833 and an E. value of 0.0, but the descriptions look like different things. C'mon what do Dengue virus, SIV, and E. coli have in common?

    (at least if we don't read carefully, wink, wink, nudge, nudge)


    Strange....

    Why would my sequence match (at least) 50 different sequences in the nucleotide database?

    Can you solve the mystery?


    Copy the sequence at the beginning of this post and give it at try. Feel free to submit comments with your answer.

    Or wait until next week, for more of the story.


    References:

    1. Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.


    technorati tags: , ,

    Copyright Geospiza, Inc.

    Comments

    #1

    Hmm, I'm reaching here ... could this sequence be the origin of replication for plasmids as well as some viruses? Curious.

    Posted by: Coleen | July 15, 2006 4:16 PM

    #2

    It is a beta-lactamase, an enzyme related to antibiotic resistance. BLAST it against a protein DB, and/or run it against PFAM

    Posted by: Diego | July 15, 2006 5:09 PM

    #3

    Coleen,
    Good guess. It is a gene that's found in many plasmids.


    Diego,
    You are right but you're solving the problem the hard way. I'll show you an easier way to find the answer next week.

    Posted by: Sandra Porter | July 15, 2006 8:34 PM

    Post a Comment

    (Email is required for authentication purposes only. Comments are moderated for spam, your comment may not appear immediately. Thanks for waiting.)





    Having problems commenting? (UPDATED)

    Search All Blogs

    Blogs in the Network

    Top Five: Readers' Picks

    Top Science Stories

    powered by SEED - seedmagazine.com