Now on ScienceBlogs: Another contender for the worst reporting ever: "Coma man"

Seed Media Group

Collective Imagination

Discovering Biology in a Digital World

My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.

Profile

Sandra Porter I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Digital World Biology).

Search

Digital World Biology

Discover Biology with Bioinformatics


Subscribe to our newsletter


e-mail digitalbio at scienceblogs.com

use 'Digital World Biology' news as the subject

DigitalBio Favorites

Science Blogs School Fundraiser


link_donorschoose_small.gif


Recent Posts

Recent Comments

Categories

Blogroll

Science Education Groups

Keep up to date

Awards

Red Orbit

Digital Bio at Blogged

Wikio - Top Blogs - Sciences
Add Digital Bio to your Technorati Favorites!





Follow me on Twitter

When you need to laugh

Interesting places

The Tangled Bank
MicrobeWorld Radio

Locations of visitors to this page

Archives

« Manipulating molecules on my new iPhone | Main | What are your favorite iPhone apps? »

Checking out the new Ebola virus and playing some tricks with BLAST

Category: BioinformaticsEvolutionclassroom activitiesphylogenysequence analysisviruses
Posted on: November 24, 2008 8:30 AM, by Sandra Porter

Ebola virus has impressed me as creepy ever since I read "The Hot Zone: A Terrifying True Story some years back by Richard Preston. (I guess he has a new book, too, Panic in Level 4: Cannibals, Killer Viruses, and Other Journeys to the Edge of Science but I haven't been in airport for the past couple of weeks, so I haven't read it yet.)

Technorati Tags: , , ,


ResearchBlogging.orgInfectious agents that cause diseases with gruesome symptoms really excite those of us with an interest in microbiology. Tara has written about this paper, too, and summarized the details.

I thought I'd show you how to have fun re-analyzing the data, demonstrate a new and unexpected feature that I happened to find in NCBI BLAST, and see if we can reproduce the phylogenetic tree from the paper by using the tree algorithms at the NCBI. Making phylogenetic trees is often kind of painful in the classroom, for various reasons, and I wanted to see if we could find a more user-friendly method.

Bone picking

First, I want to point out that the authors of this paper (1) were a bit negligent concerning their materials and methods. This is irksome, although not uncommon where the bioinformatics methods are concerned. (You know, we did computer stuff, it's all magic anyway.)

If you notice here, in this figure from the paper, there are 14 different genome sequences listed in the tree. I would expect to find the accession numbers for all 14 sequences in the paper.

ebola_tree.gif

Did I find the accession numbers in the paper?

No, I found six out of the 14, less than half. I only found two of the Marburg sequences in GenBank, and I'm not positive that those were the same ones in the paper. I think the reviewers were sloppy in this regard. How can an experiment be repeated if the materials aren't described? or even available? I would have thought the reviewers would at least look to see if the accession numbers were in the paper (they're not).

The paper gives the impression that complete genomes were used to create the tree in their figure. If that's true, it's hard to see where the data lives.

Still, I found the six most important genomes and a couple from Marburg virus (2), so I had some material to work with.


Learning from our mistakes

The reason I wanted these genomes was that I wanted to see if I could reproduce the published tree by using the tree analysis algorithms at the NCBI.

My first attempt failed miserably. I would enter my query in the usual place and then enter the accession numbers for the other viruses as an Entrez query.

The problem was, that none of the BLAST databases, I queried, contained the set of viral sequences that I wanted to see.

Actually, as it turned out, one database did contain some of the sequences, but the others were in a different database. Since NCBI BLAST only allows me to query one database, I can't search the data set that I want to see.

(At least that's what I thought.)

This difficulty with comparing sequences from different databases in one BLAST experiment has long been a source of frustration for me. If I'm doing this for work, I just make my own database and use our Finch Software for running BLAST or I run BLAST on my Mac.

When I'm teaching a class, though, I don't want to have to make students learn UNIX and install BLAST on their laptops and we haven't put BLAST on the student version of our software.


New tricks with BLAST
Luckily, I noticed that BLAST has something new.

new-checkbox.gif

I had ignored this new checkbox because I didn't want to compare two sequences.

However, that was a mistake. That checkbox is useful!

When I clicked it, another window opened up.

new_window.gif

Now, I could enter the accession numbers for my new sequences!

Notice, too, the format. I tried some different ways for entering the numbers.

This method: Accession1, Accession2, Accession3..... Did not work.

This method: Accession1 Accession2 Accession3..... Did not work.

I could only get BLAST to work if I entered the sequences like this:

Accession1
Accession2
Accession3

I don't know why that is, but it really did work. I used these sequences:

NC_002549
AY354458
NC_006432
FJ217162
NC_004161
NC_001608
DQ447653

with the newly discovered virus, FJ217161, as the query.

Here are my results:

ebola_blast.gif

and, when I click the Distance tree of results link in BLAST, and use the default tree settings, I get a tree with the same shape and arrangement as the one in the paper (1).

tree_results.gif


Formatting the tree

Nothing is perfect of course. It's impossible to make the font size from the NCBI tree large enough to read.

Luckily, you can download the tree in the Newick format and make a pretty picture with the combination of NJ plot and a graphics program like Adobe Illustrator.

Here it is after some formatting. I highlighted the new virus.

blast_tree_newick.gif


Conclusions:

1. This new blast feature, where you can blast against your own set of sequences, is really helpful.

2. You can make the correct trees with the Mimimum Evolution Algorithm at the NCBI, but you will need to format your trees (i.e. make them pretty) somewhere else if you need pretty pictures.


References:

  1. Jonathan S. Towner, Tara K. Sealy, Marina L. Khristova, César G. Albariño, Sean Conlan, Serena A. Reeder, Phenix-Lan Quan, W. Ian Lipkin, Robert Downing, Jordan W. Tappero, Samuel Okware, Julius Lutwama, Barnabas Bakamutumaho, John Kayiwa, James A. Comer, Pierre E. Rollin, Thomas G. Ksiazek, Stuart T. Nichol (2008). Newly Discovered Ebola Virus Associated with Hemorrhagic Fever Outbreak in Uganda PLoS Pathogens, 4 (11) DOI: 10.1371/journal.ppat.1000212
  2. J. S. Towner (2006). Marburgvirus Genomics and Association with a Large Hemorrhagic Fever Outbreak in Angola Journal of Virology, 80 (13), 6497-6516 DOI: 10.1128/JVI.00069-06

Share this: Stumbleupon Reddit Email + More

Comments

1

I feel your pain.
Lack of accession numbers in the literature is a constant problem for me as well. What are journal editors thinking not checking for these before the paper's published? I work with the spotty zebfrafish genome, where at NCBI exons and so forth are rarely mapped, so it's especially frustrating.

Posted by: gillt | November 24, 2008 4:12 PM

2

I hate hate hate the Hot Zone. It's so over the top.

Ed Regis's "Virus: Ground Zero" is a much better Ebola book.
http://www.amazon.com/Virus-Ground-Zero-Stalking-Viruses/dp/0671553615

Needed to vent that...
Awesome post. NCBI Blast tools are great.

Posted by: zayzayem | December 6, 2008 8:35 AM

3

how is this method different from other phylogeny tree program, eg Phylip?

Posted by: anon | January 7, 2009 11:02 AM

4

Anon -

Phylip is a collection of at least 30 different programs. Some of these programs use different methods to make trees.

The NCBI offers many of the same methods for making these trees that are offered in Phylip along with some additional algorithms. They have more information at their site and descriptions of the programs if you're interested.

It's also nice that the work happens on the NCBI server and not on your computer. This takes care of some of the problems that I've run into when I've run Phylip on my computer.

Posted by: Sandra Porter | January 7, 2009 1:40 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Enter to win a free copy of The Monty Hall Problem
Visit the Collective Imagination blog
Advertisement
Collective Imagination

© 2006-2009 Seed Media Group LLC. ScienceBlogs is a registered trademark of Seed Media Group. All rights reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | SEEDMAGAZINE.COM