Digital Biology Friday: hot plants and viruses, part IV

By sporte on May 18, 2007.

tags: plants, bioinformatics, sequence analysis, viruses, fungi

Quick synopsis: A type of grass grows in Yellowstone National Park in hot (65° C), unfriendly soil. How the plant manages this feat is a mystery. What we do know, is that the grass can only tolerate high temperatures if it's been infected by a fungus, and the fungus has to be infected by an RNA virus. In the paper describing this discovery, the researchers provided the GenBank accession numbers for the viral sequences. I decided to see if I could find out more about the proteins and what they do. Read part I, part II, and part III.

And now, on with our story.

Down the rabbit hole, we go, but:

We begin with a BLAST

I started the quest by using the accession numbers, from the paper, to get the GenBank records and the sequences. The authors of the paper had already found that one piece of viral RNA (RNA 1) codes for a protein that's likely to be a replicase (1). I confirmed this finding using blastp and found that the predicted proteins do contain a conserved replicase domain.

(Q: Why do I call this a "predicted" protein?

A: Because this is a conventional way of referring to a protein, whose existence has yet to be supported by physical data.)

A replicase, by the way, is a perfectly reasonable thing for a viral RNA to encode. All viruses have to have a way to get their genomes copied if they're going to be able to go off and infect new cells. They would use a replicase to make new copies of the RNA genome.

That leaves the question of the other RNA (this virus has two pieces of RNA that were sequenced). I looked up the predicted protein sequences for the second RNA and used blastp to compare the predicted proteins to the sequences in the non-redundant database. I couldn't find anything for the smaller (168 aa protein), but the larger protein matched lots of hypothetical proteins from fungi. Some of those results are described here.

Accession number	RNA sequence	Protein GI number	Result
EF120984	RNA 1	ABM92658	possible replicase, contains a conserved domain for an RNA dependent RNA polymerase
		ABM92659.1	possible replicase? some matches to the catalytic region of an integrase
EF1209845	RNA 2	ABM92660.1	331 amino acids matches lots of hypothetical fungal proteins with unknown functions
		ABM92661.1	168 amino acids no match to anything

Lost in translation?

The next thing that I tried was blastx. BLASTX takes a DNA (or RNA) sequence, figures out all of the amino acid sequences that could be produced from all six ways of reading the sequence (we call this "translation") and then compares all the possible sequences to a database of protein sequences.

I tried blastx for two reasons.

First, many annotation errors are made because of DNA sequencing mistakes. If one or two bases are missing, the translation can be messed up. It would be like this sentence: "The fat cat sat on pat." If this sentence used the same reading frame and had a letter missing, it would read: "Thf atc ats ato npa." Imagine, now if I went to the library and tried to find a book with the phrase "Thf atc ats." If I had the right sentence, I would probably find Dr. Seuss. If I used the messed up sentence, I'd be out of luck.

The public databases have lots of these kinds of mistakes in translation. In a perfect world, I would be able to get the trace data from the DNA sequences, look at it myself with FinchTV, evaluate the quality, and possibly reassemble the sequences. In the real world, much of this data is not publicly available. NCBI for example, only stores trace data for a small number of viruses - most of them influenza. But enough whining, let's move on.

A challenge with blastx, is that different organisms use different versions of the genetic code and it's not always possible to know which version is used by the organism that you're studying. NCBI offers a choice of 13 genetic codes but I didn't have any luck trying find which code would be used by my RNA virus or even the fungal host. After chewing on this for awhile, I picked "yeast nuclear" reasoning that the virus infects a fungus and yeast is a fungus.

Here are the results:

i-b9daff63b281a23a50c899a5c3f2f915-1_blastx_results_5_2007.gif

The top two matches (red bars) are to the predicted sequences that are deposited in GenBank. They serve as a positive control, since they should match themselves.

Scanning down the page, from top to bottom, I see that the next best matching sequences (naturally) are from hypothetical or putative proteins. They had good E values, too, and it is reassuring, though, that they come from fungi (or possibly fungal viruses, I don't have enough data to know which it is).

Looking farther down, a couple of long sequences match both proteins. Both are from rice and one is a transposon sequence. They look like a good match and seem to fit my idea about a possible frame shift. But nothing is known about these proteins, so I decide on another path.

Taking a random walk?

The next path, I stumbled on by accident. I was planning to look at some of the "hypothetical" and "putative" fungal sequences and see if they matched anything interesting, when I found something new.

I had called up the GenBank record for the 331 amino acid protein from RNA 2 and clicked "BLink." Blink is short for "Blast link." BLink takes me to a database of pre-computed blastp results for my Curvularia protein.

I like to use Blink since it has lots of filters for viewing which sequences belong to which kingdom, which part of the protein aligns, which sequences have structures, and so on. I decided that I would get a list these sequences and use those as queries for more searching. So, I clicked the GI list button to get a set of sequences and instead got a surprise!

I never saw that Related Structures tab before!

What could it mean?

Join us next Friday, when we go through looking glass and see what we can find there.

Reference:

1. Márquez, L., et. al. 2007 A Virus in a Fungus in a Plant: Three-Way Symbiosis Required for Thermal Tolerance Science 26: 513-515.

More like this

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…