Digital Biology Fridays
Let's play anomaly!
Most of this week, I've written about the fun time I had playing around with NCBI's Blink database and finding evidence that at least one mosquito, Aedes aegypti, seems to have been infected at some point with a plant paramyxovirus and that the paramyxovirus left one of its genes behind, stuck in the mosquito genome.
During this process, I realized that the method I used works with other viruses, too. I tried it with a few random viruses and sure enough, I found some interesting things.
You've got a week to give it a try. Let's see what you find! The method is…
Here's a fun puzzler for you to figure out.
The blast graph is here:
The table with scores is here, click the table to see a bigger image:
And here is the puzzling part: Why is the total score so high?
If you want to repeat this for yourself, go here.
You can use this sequence as a query (it's the same one that I used).
>301.ab1
CTAGCTCTTGGGTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCCGATGGAG
GGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGTGGGGGA
CCTTCGGGCCTCACACCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGG
CTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGA…
If you look below the fold, you can see two molecules locked in a tight embrace. These molecules or their closely related cousins can be found in any cell because their ability to evolve is slowed by their need to interact with each other in the right way.
In an earlier post, I asked:
Who are they?
One partner is a small bit of 16S ribosomal RNA, about 56 nucleotides to be precise. The other partner is S15, one the proteins in the ribosome.
If we could look inside the bacteria that made these, we would see lots of other proteins binding to these two partners within a molecular machine.…
Last week I posted an image with two molecules (below the fold), one protein and one nucleic acid, and asked you about the probability of finding similar molecules in different species.
You gave me some interesting answers.
DAG made me clarify my question by asking what I meant by "similarity." I was wondering whether I would be likely to find a statistically relevant match by doing a BLAST search and I hadn't really thought about the cutoff values. I decided to guess and say that that the protein would be about 30% similar and the nucleic acid about 60%.
Paul gave me some answers…
This is a fun puzzle. The pink molecule is a protein and the other molecule is a nucleic acid.
If I gave you the amino acid sequence of this protein, or the nucleotide sequence of this nucleic acid, what is the probability of finding a similar sequence in a different species (picked at random)?
A. High
B. Medium
C. Low
D. It depends on the database that you're searching.
You can have more than one answer.
Now, here's the hard part. Explain why you think your answer is correct.
As many of you know, I'm a big fan of do-it-yourself biology. Digital biology, the field that I write about, is particularly well-suited to this kind of fun and exploration.
Last week, I wrote some instructions for making a phylogenetic tree from mitochondrial genomes. This week, we'll continue our analysis.
I wrote this activity, in part, because of this awful handout that my oldest daughter brought home last year. She presented me with an overly photocopied paper that showed several protein sequences from cytochrome C in several creatures. She said she was supposed count the…
Last year I wrote about an experiment where I compared a human mitochondrial DNA sequence to primate sequences in the GenBank. Since I wanted to know about the differences between humans, gorillas, and chimps, I used the Entrez query 'Great Apes' to limit my search to a set of sequences in the PopSet database that contained gorillas, bonobos, chimps, and human DNA.
A week ago, I tried to repeat this experiment and...
It didn't work.
All I saw were human mitochondrial sequences. I know the other sequences match, but I didn't see them since there are so many human sequences that match…
During the past few Fridays (or least here and here), we've been looking at a paper that was published from China with some Β-lactamase sequences that were supposedly from Streptococcus pneumoniae. The amazing thing about these particular sequences is that Β-lactamase has never been seen in S. pneumoniae before, making this a rather significant (and possibly scary) discovery.
If it's correct.
tags: DNA sequence analysis, antiobiotic resistance , microbiology, blastn
The way this sequence was identified as Β-lactamase was through a blastn search at the NCBI. And in fact, it was correct to…
I began this series last week with a question about a DNA sequence that was published and reported to be one the first beta-lactamases to be found in Streptococcus pneumoniae. Mike has a great post about one of problems with this paper.
I think the data themselves are awfully suspicious.
So, last week I suggested that you, dear readers, go and find out why. I gave you a link to the abstract and a place to get started.
Perhaps that was too hard.
Sigh.
Okay, here's a little more help and another clue.
I highlighted the accession numbers. Post your guesses in the comments.
Last Friday, we had another in the series of weird DNA structures. (You can see the first here).
I asked the audience to identify the unusual feature in this molecule. Here's the first picture:
tags: DNA structure, DNA , molecular structure, biochemistry
Here's the answer:
Steve L. guessed it correctly. This is not just DNA, it's a DNA:RNA hybrid. I circled a 2' hydroxyl group here to make it easier to see the difference. (Remember - the "D" in DNA stands for "deoxy.") The oxygens are red and it's easiest to tell the difference between the strands if you count them.
For extra credit -…
I've had some requests for some more molecular puzzles since the last one that I posted (see A DNA puzzle ). One person liked it so much he even blogged about it.
So, here's one for you to chew on over the weekend.
This puzzle is a variation of an activity in Exploring DNA Structure, a CD/lab book that I made (with funding from the NSF) and used for some educational research.
tags: DNA structure, DNA , molecular structure, biochemistry
Any ideas?
How does grass grow in the extremely hot soils of Yellowstone National Park? Could a protein from a virus help plants handle global warming? Okay, that second sentence is wild speculation, but we will try to find the answer to our mystery by aligning our protein sequence to a sequence from a related structure.
tags: plants, bioinformatics, sequence analysis, viruses, fungi, global warming,
Read part I, part II, part III, part IV, and part V, to see how we got here.
This week, in our last installment, we will seek the answers in a related structure.
Last week, I found that my…
tags: plants, bioinformatics, sequence analysis, viruses, fungi
How does grass grow in the extremely hot soils of Yellowstone National Park? The quest continues.
Read part I, part II, part III, and part IV to see how we got here.
And read onward to see where will we go.
In our last episode, I discovered a new tab in the protein database (well, new to me anyway).
Related structures
If you select this tab, you get a list of protein sequences that are similar, by blastp, to the amino sequences in protein structures.
Naturally, I clicked the tab, and then the Links link, to see…
I found it in the MeSH database.
Really!
Looking for a quick answer? Don't ask a scientist
It doesn't take long to realize that scientists can spend countless hours debating the meaning of words. Our very own ScienceBlogs is a great example, just look at the many ways we can define (and debate) the meaning of a small, four-letter word like "gene". We also like to qualify our answers with a thousand conditions "usually, it's like this, but...."
This habit can be very frustrating if all you want is a quick concise answer.
On your marks, define that term!
So, many people turn to Google and…
In last week's episode, your assignment was to think of an interesting plant trait and find a description about a gene, related to that trait, by searching PubMed.
Since coming up with an interesting trait might be a challenge for some people, let's think about how to approach this step.
Picking your trait.
If you're having a hard time thinking of a trait, it might be helpful think about where plants grow, why we grow plants, and why it might be hard or easy for plants to grow.
Some of the environmental factors that affect plant growth are: climate, soil composition, nutrient…
Many of you might take this for granted, and I know it seems amazing today, but I when first started teaching, our access to scientific literature was pretty limited. I could go to the UW and use Grateful Med to search Medline, but we didn't have anything like it at my college and web browsers, like Mosaic, had yet to be invented. So, when I first started giving workshops for teachers on biotechnology and the world of the web, many were quite surprised to find out about the PubMed database.
Since PubMed is (to me) one of the best resources to ever come along, I think we should explore it a…
and what is the volume of the sea?
This sounds a bit like the beginning of a poem but it's really the answer to the question we posed last week on a Digital Biology Friday.
We can see, in the sequence window, that two strands are both labeled 5' on the left side and 3' on the right. We call this direction "five prime to three prime."
But, when we look in the structure window, we see that the two strands are oriented in the opposite direction relative to each other. The 5' end of one strand is located across from the 3' end of the other strand.
(Note: I added the arrow and labels, this…
Today, we're going to look for rainbows in double-stranded DNA and see what they can tell us about DNA structure.
First, we're going to get a structure for a double-stranded molecule of DNA and open it in Cn3D.
1K9L
If you want to do this at home and you haven't already downloaded a copy of Cn3D, you may want to read these instructions and get a copy. These directions also show how to download and open the structure. It's pretty simple once you've given it a try.
Hide a strand
Next, we're going to hide one of the strands. To do this, look in the menu bar for the Show/Hide menu and open…
Why do I love Cn3D? Let me count the ways.
What does Cn3D do? (Hint: say "Cn3D" out loud).
Seriously, Cn3D is a program that draws lovely pictures of molecular structures by using experimental data from techniques like X-ray crystallography and nuclear magnetic resonance spectroscopy. Surprisingly (to some), and in contrast to many bioinformatics programs, Cn3D is really easy and fun to use.
Have you ever used programs like MS Office? Using Cn3D is at least 10 times easier.
An added benefit is that you don't have to try and find old copies of Netscape or other bits of obsolete software…
If we compare sections 1, 2, and 3, we see that section 2 matches very well in a number of different samples, and that there are differences between the sequences in sections 1 and 3.
We also learn something about the people who did the experiment.
At first it appears somewhat odd that there are many matching sequences that are all shorter than the genome and all the same length.
What's up with that?
It turns out that information doesn't have anything to do with the fraction of the genome that matches our query. These short segments are PCR products. They're the same size because the PCR…