Digital Biology Friday: What was that gene anyway?

Welcome back!

If you've just joined us, we're in the middle of a quest to find the identity of an unknown nucleotide sequence. To summarize our results so far, we used this sequence to do a blastn search of GenBank, using all the default settings at the NCBI. You can see the beginning of the project here.

And we had some rather curious results.

It appeared that our sequence matched sequences from very diverse organisms, like Dengue virus, E. coli, and Simian Immunodeficiency virus. Very strange!

There was another curious word, too, that appeared in the descriptions for each of the results.

That word was VECTOR. "Vector" is a word that I imagine Sherlock Holmes would have used if he wanted to interrogate a scientist or mathematician and find out what they did without having them realize that he was trying to do so.

To a mathematician or a physicist, a vector is a straight line with a magnitude and direction. To a public health official, a vector is a rat, mouse, louse, or insect; anything capable of carrying a disease.

And, to a molecular biologist, a vector can be a plasmid, phage, or eucaryotic virus that is used to move genes around from place to place. This information can help us make some good guesses about the function of our unknown bit of DNA, because vectors have been engineered to have some common features. Some of these are special DNA sequences that allow plasmids to be copied. Some of the special features are genes that encode for enzymes that make bacteria resistant to different antibiotics. If a bacterial cell contains a plasmid with one of these antibiotic resistance genes, it produces a protein that allows it to live in the presence of an antibiotic. These features are helpful for biologists because we can select bacteria that are resistant to a drug and kill off all the rest.

Okay, where were we?

Back to our results:

Here is our list of matching sequences from the blastn search. We had some good guess last week about answers, and one was right, but involved far too much work.

I think it's far easier to look at the data.

Here's how.


We click the link to the alignment score.


This shows us where our sequences match each other. Pay attention to the positions of the subject sequence that match our query! We need to remember this. Our sequence starts matching at 44, 246 and ends matching at 44, 665.

Then we click the link to the matching sequence, and scroll down the page.



Eventually, we reach numbers. These numbers represent positions in the DNA sequence.

Here's the region where our sequence matches:


And our answer is, the beta lactamase gene. This gene codes for an enzyme that breaks the beta-lactam rings, thus disabling antibiotics like pencillin.

technorati tags: , ,

Copyright Geospiza, Inc.

More like this

"Hey Rocky, watch me pull a rabbit out of my hat!" I realized that I should add just a bit more information to last answer on gene identification, so here it is. After the last installment, Diego commented: but still you do not know exactly what part of your DNA sequence is matching to the…
During the past few Fridays (or least here and here), we've been looking at a paper that was published from China with some Β-lactamase sequences that were supposedly from Streptococcus pneumoniae. The amazing thing about these particular sequences is that Β-lactamase has never been seen in S.…
Last week, we embarked on an adventure with BLAST. BLAST, short for Basic Alignment Search Tool, is a collection of programs, written by scientists at the NCBI (1) that are used to compare sequences of proteins or nucleic acids. BLAST is used in multiple ways, but last week my challenge to you,…
The wind storms and heavy rains that hit Seattle recently, demonstrated why a bypass mechanism can be a helpful thing - for both bacteria and motorists. Under the bridge on Mercer, from the Seattle Times When the weather is nice, I bike to work. But when the weather gets bad, (I consider rain…

Well it is easier, but still you do not know exactly what part of your DNA sequence is matching to the annotated protein.

To know that it is much better to do a blast search against a protein DB. Then you will have information about the conservation of your sequence, which can be also useful.

And after that you can use PFAM to be sure that the protein have a "functional" conserved domain.

As you have it, it would be like the very first step, but then you have to carry on, and verify your initial findings using more specific tools.

Hi Diego,

Actually, you can look at the GenBank record and see how the DNA sequence corresponds to the encoded protein. I show it here.

I agree, PFAM is helpful if you're trying to understand the function of a truly unknown protein, or if your match isn't as good as it was in this case (100%). I also really like the Conserved Domain Database.

Hey, thanks for the really informative posts. I've been trying to get a handle on this stuff for a while, and seeing these tasks done in context just made it all click.