If you’ve just joined us, we’re in the middle of a quest to find the identity of an unknown nucleotide sequence. To summarize our results so far, we used this sequence to do a blastn search of GenBank, using all the default settings at the NCBI. You can see the beginning of the project here.
And we had some rather curious results.
It appeared that our sequence matched sequences from very diverse organisms, like Dengue virus, E. coli, and Simian Immunodeficiency virus. Very strange!
There was another curious word, too, that appeared in the descriptions for each of the results.
That word was VECTOR. “Vector” is a word that I imagine Sherlock Holmes would have used if he wanted to interrogate a scientist or mathematician and find out what they did without having them realize that he was trying to do so.
To a mathematician or a physicist, a vector is a straight line with a magnitude and direction. To a public health official, a vector is a rat, mouse, louse, or insect; anything capable of carrying a disease.
And, to a molecular biologist, a vector can be a plasmid, phage, or eucaryotic virus that is used to move genes around from place to place. This information can help us make some good guesses about the function of our unknown bit of DNA, because vectors have been engineered to have some common features. Some of these are special DNA sequences that allow plasmids to be copied. Some of the special features are genes that encode for enzymes that make bacteria resistant to different antibiotics. If a bacterial cell contains a plasmid with one of these antibiotic resistance genes, it produces a protein that allows it to live in the presence of an antibiotic. These features are helpful for biologists because we can select bacteria that are resistant to a drug and kill off all the rest.
Okay, where were we?
Back to our results:
Here is our list of matching sequences from the blastn search. We had some good guess last week about answers, and one was right, but involved far too much work.
I think it’s far easier to look at the data.
We click the link to the alignment score.
This shows us where our sequences match each other. Pay attention to the positions of the subject sequence that match our query! We need to remember this. Our sequence starts matching at 44, 246 and ends matching at 44, 665.
Then we click the link to the matching sequence, and scroll down the page.
Eventually, we reach numbers. These numbers represent positions in the DNA sequence.
Here’s the region where our sequence matches:
And our answer is, the beta lactamase gene. This gene codes for an enzyme that breaks the beta-lactam rings, thus disabling antibiotics like pencillin.