Checking out the new Ebola virus and playing some tricks with BLAST

By sporte on November 24, 2008.

Ebola virus has impressed me as creepy ever since I read "The Hot Zone: A Terrifying True Story some years back by Richard Preston. (I guess he has a new book, too, Panic in Level 4: Cannibals, Killer Viruses, and Other Journeys to the Edge of Science but I haven't been in airport for the past couple of weeks, so I haven't read it yet.)

Technorati Tags: blast, phylogenetic trees, Ebola, viruses

Infectious agents that cause diseases with gruesome symptoms really excite those of us with an interest in microbiology. Tara has written about this paper, too, and summarized the details.

I thought I'd show you how to have fun re-analyzing the data, demonstrate a new and unexpected feature that I happened to find in NCBI BLAST, and see if we can reproduce the phylogenetic tree from the paper by using the tree algorithms at the NCBI. Making phylogenetic trees is often kind of painful in the classroom, for various reasons, and I wanted to see if we could find a more user-friendly method.

Bone picking

First, I want to point out that the authors of this paper (1) were a bit negligent concerning their materials and methods. This is irksome, although not uncommon where the bioinformatics methods are concerned. (You know, we did computer stuff, it's all magic anyway.)

If you notice here, in this figure from the paper, there are 14 different genome sequences listed in the tree. I would expect to find the accession numbers for all 14 sequences in the paper.

Did I find the accession numbers in the paper?

No, I found six out of the 14, less than half. I only found two of the Marburg sequences in GenBank, and I'm not positive that those were the same ones in the paper. I think the reviewers were sloppy in this regard. How can an experiment be repeated if the materials aren't described? or even available? I would have thought the reviewers would at least look to see if the accession numbers were in the paper (they're not).

The paper gives the impression that complete genomes were used to create the tree in their figure. If that's true, it's hard to see where the data lives.

Still, I found the six most important genomes and a couple from Marburg virus (2), so I had some material to work with.

Learning from our mistakes

The reason I wanted these genomes was that I wanted to see if I could reproduce the published tree by using the tree analysis algorithms at the NCBI.

My first attempt failed miserably. I would enter my query in the usual place and then enter the accession numbers for the other viruses as an Entrez query.

The problem was, that none of the BLAST databases, I queried, contained the set of viral sequences that I wanted to see.

Actually, as it turned out, one database did contain some of the sequences, but the others were in a different database. Since NCBI BLAST only allows me to query one database, I can't search the data set that I want to see.

(At least that's what I thought.)

This difficulty with comparing sequences from different databases in one BLAST experiment has long been a source of frustration for me. If I'm doing this for work, I just make my own database and use our Finch Software for running BLAST or I run BLAST on my Mac.

When I'm teaching a class, though, I don't want to have to make students learn UNIX and install BLAST on their laptops and we haven't put BLAST on the student version of our software.

New tricks with BLAST
Luckily, I noticed that BLAST has something new.

I had ignored this new checkbox because I didn't want to compare two sequences.

However, that was a mistake. That checkbox is useful!

When I clicked it, another window opened up.

Now, I could enter the accession numbers for my new sequences!

Notice, too, the format. I tried some different ways for entering the numbers.

This method: Accession1, Accession2, Accession3..... Did not work.

This method: Accession1 Accession2 Accession3..... Did not work.

I could only get BLAST to work if I entered the sequences like this:

Accession1
Accession2
Accession3

I don't know why that is, but it really did work. I used these sequences:

NC_002549
AY354458
NC_006432
FJ217162
NC_004161
NC_001608
DQ447653

with the newly discovered virus, FJ217161, as the query.

Here are my results:

and, when I click the Distance tree of results link in BLAST, and use the default tree settings, I get a tree with the same shape and arrangement as the one in the paper (1).

Formatting the tree

Nothing is perfect of course. It's impossible to make the font size from the NCBI tree large enough to read.

Luckily, you can download the tree in the Newick format and make a pretty picture with the combination of NJ plot and a graphics program like Adobe Illustrator.

Here it is after some formatting. I highlighted the new virus.

i-6ddd72b37fde3f91bf7af895fb91ee61-blast_tree_newick.gif

Conclusions:

1. This new blast feature, where you can blast against your own set of sequences, is really helpful.

2. You can make the correct trees with the Mimimum Evolution Algorithm at the NCBI, but you will need to format your trees (i.e. make them pretty) somewhere else if you need pretty pictures.

References:

Jonathan S. Towner, Tara K. Sealy, Marina L. Khristova, CÃ©sar G. AlbariÃ±o, Sean Conlan, Serena A. Reeder, Phenix-Lan Quan, W. Ian Lipkin, Robert Downing, Jordan W. Tappero, Samuel Okware, Julius Lutwama, Barnabas Bakamutumaho, John Kayiwa, James A. Comer, Pierre E. Rollin, Thomas G. Ksiazek, Stuart T. Nichol (2008). Newly Discovered Ebola Virus Associated with Hemorrhagic Fever Outbreak in Uganda PLoS Pathogens, 4 (11) DOI: 10.1371/journal.ppat.1000212
J. S. Towner (2006). Marburgvirus Genomics and Association with a Large Hemorrhagic Fever Outbreak in Angola Journal of Virology, 80 (13), 6497-6516 DOI: 10.1128/JVI.00069-06

More like this

I feel your pain.
Lack of accession numbers in the literature is a constant problem for me as well. What are journal editors thinking not checking for these before the paper's published? I work with the spotty zebfrafish genome, where at NCBI exons and so forth are rarely mapped, so it's especially frustrating.

I hate hate hate the Hot Zone. It's so over the top.

Ed Regis's "Virus: Ground Zero" is a much better Ebola book.
http://www.amazon.com/Virus-Ground-Zero-Stalking-Viruses/dp/0671553615

Needed to vent that...
Awesome post. NCBI Blast tools are great.

how is this method different from other phylogeny tree program, eg Phylip?

Anon -

Phylip is a collection of at least 30 different programs. Some of these programs use different methods to make trees.

The NCBI offers many of the same methods for making these trees that are offered in Phylip along with some additional algorithms. They have more information at their site and descriptions of the programs if you're interested.

It's also nice that the work happens on the NCBI server and not on your computer. This takes care of some of the problems that I've run into when I've run Phylip on my computer.

can we transfer virus using radiant energy or light focusing ray?

No.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…