Digital Biology Friday: A microbiology blast puzzler

Here's a fun puzzler for you to figure out.

The blast graph is here:

i-02f5f2aaa95bc8ab8660ebaba090a49e-graph.png

The table with scores is here, click the table to see a bigger image:

i-c6f0f4e4fc0d3b6f0f299d302de71acd-smaller_table.png

And here is the puzzling part: Why is the total score so high?

If you want to repeat this for yourself, go here.

You can use this sequence as a query (it's the same one that I used).

>301.ab1
CTAGCTCTTGGGTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCCGATGGAG
GGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGTGGGGGA
CCTTCGGGCCTCACACCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGG
CTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGA
CACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTG
ATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGAGGAGG
AAGGCATTGTGGTTAATAACCGCAGTGATTGACGTTACTCGCAGAAGAAGCACCGGCTAAC
TCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAA
AGCGCACGCAGGCGGTCTGTCAAGTCGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCA
TTCGAAACTGGCAGGCTAGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAAT
GCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTC
AGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGAT
GTCTATTTGAAGGTTGTTCCCTTGAGGAGTGGCTTTCGGAGCTAACGCGTTAAATAGACCGCC
TGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTG
GAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTACTCTTGACATCCAC

And change the database to Reference genomic sequences.

I'll post the answer next Friday.

More like this

Last year I wrote about an experiment where I compared a human mitochondrial DNA sequence to primate sequences in the GenBank. Since I wanted to know about the differences between humans, gorillas, and chimps, I used the Entrez query 'Great Apes' to limit my search to a set of sequences in the…
As many of you know, I'm a big fan of do-it-yourself biology. Digital biology, the field that I write about, is particularly well-suited to this kind of fun and exploration. Last week, I wrote some instructions for making a phylogenetic tree from mitochondrial genomes. This week, we'll continue…
In which we identify unknown human proteins. Yesterday, I wrote about using the BLOSUM 62 matrix to calculate a score for matches between two proteins. Those scores give us a good start on understanding how blastp determines whether two sequences are matching by chance or because they're more…
No more delays! BLAST away! Time to blast. Let's see what it means for sequences to be similar.  First, we'll plan our experiment.  When I think about digital biology experiments, I organize the steps in the following way:             A.  Defining the question B.  Making the data sets…

The total score is the sum of the scores of all the HSPs (alignments). In this case there are lots of HSPs, because this sequence appears many times in the target genomes.