Now on ScienceBlogs: The Australian's War on Science 41

Seed Media Group

Discovering Biology in a Digital World

My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.

Profile

Sandra Porter I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Digital World Biology).

Search

Digital World Biology

Discover Biology with Bioinformatics


Subscribe to our newsletter


e-mail digitalbio at scienceblogs.com

use 'Digital World Biology' news as the subject

DigitalBio Favorites

Science Blogs School Fundraiser


link_donorschoose_small.gif


Recent Posts

Recent Comments

Categories

Blogroll

Science Education Groups

Keep up to date

Awards

Red Orbit

Digital Bio at Blogged

Wikio - Top Blogs - Sciences
Add Digital Bio to your Technorati Favorites!





Follow me on Twitter

When you need to laugh

Interesting places

The Tangled Bank
MicrobeWorld Radio

Locations of visitors to this page

Archives

« Did the California H1N1 swine flu come from Ohio? | Main | It's a new flu season: do you cough safely? »

More flu follies: comparing sequences and making trees, activity 4

Category: BioinformaticsDatabasesGenetics & Molecular BiologyInfluenza resourcesScience educationclassroom activitiesphylogenysequence analysis
Posted on: April 29, 2009 11:21 AM, by Sandra Porter

What tells us that this new form of H1N1 is swine flu and not regular old human flu or avian flu?

If we had a lab, we might use antibodies, but when you're a digital biologist, you use a computer.

Activity 4. Picking influenza sequences and comparing them with phylogenetic trees

We can get the genome sequences, piece by piece, as I described in earlier, but the NCBI has other tools that are useful, too.

The Influenza Virus Resource will let us pick sequences, align them, and make trees so we can quickly compare the sequences to each other.

This is how I got the sequences that I wrote about yesterday. I think the more people we have looking at sequences, the better off we are.

I'll show you how this works by getting and comparing sequences from the hemagglutinin (HA) protein from the recent cases of H1N1 swine flu and comparing those sequences to the HA protein from other cases of H1N1 swine flu that happened last year.

1. Go to the NCBI Influenza Virus Resource (this will open a new window).

2. Start out by getting the sequences from the recent swine flu cases in California and Texas.

To do this, we will pick Influenza A as the virus species, human as the host, North America as the region, and HA as the segment. Protein sequences are selected by default and those are just fine.

Then, we set the date range from 2009, 03, 01, to 2009, 04, 29.

Last, we click the Add to Query Builder button to get the sequences.

I forgot to put this in the image, but I also used a filter to select for H1.  I typed "H1" in the really long text box.  Also, note, I was looking at the protein sequences.  (We should look at nucleotides, too, but that's a later experiment.)


flu_query1.png

3. This query finds 7 sequences. If we click the Get Sequences button, we can see that that these are the California and Texas isolates.

h1n1seqs.png

Now, we have to decide which groups we'd like to compare. I decided to compare these to other H1N1 flu sequences and to some sequences from pigs.

4. To get other flu sequences for comparison, I used the same queries (1-2) with some changes. 

       a.  For one set of sequences, I changed the host to "Swine."

       b. For the other set of sequences, I changed the date range so that I could get older sequences.

       c.  Each time I changed the settings, I clicked the Add to Query Builder button.


Now, the Query Builder contains the H1 sequences from the seven US cases, 272 sequences from people who've been infected with H1N1 over the past year in North America, and 5 H1 sequences from pigs.

query2.png

5. Then, I click the Get Sequences button.

This gives me a long list with far more sequences than I want to use. I click the check box at the top to deselect everything, then I use the check boxes to select the sequences I want to compare.

I sorted by year to make my 2009 cases easier to find. Then, it's time to decide which sequences to pick.

Hmmm, of course I picked the seven swine flu cases, then I picked some sequences that were isolated from actual swine, then some other human cases of H1N1 that happened in different parts of North America last year.

At this point, I could download sequences and work on my own computer or I can use some of the analysis tools at the NCBI. I decided to let the NCBI's computers do the work, so I clicked the Multiple Alignment button to see the amino acid similarities, then, I clicked the Build a tree button, and a lot of Next step buttons.

Here's my tree:

big_tree.png After making the tree, I decided to look at all the sequences in my set. Here's what I get from that analysis: tree2_small.png View the full-size image


What do I conclude from this? Well, first, it looks reasonable to say that the people in Texas and California were probably infected with the same strain since those sequences cluster pretty closely together.

Second, it looks like the HA protein from the California and Texas strains is most similar to the HA protein from a strain that infected some pigs in Ohio a couple of years ago and it is not as closely related to the 200 some strains of H1N1 that infected other people in 2008.

You guys can play amateur epidemiologist, too, and look at other strains or look at the New York strains.  I think the more eyes we have looking at these, the better off we are. 

Nucleotide sequences should be looked at and other tree methods would be good to try as well.  And, of course as if things weren't complicated enough, there are 8 different segments of the flu genome.

Have fun!

Share this: Stumbleupon Reddit Email + More

Comments

1

Hi, if you Fasta search the sequences for this current North american H1N1 outbreak, it can be seen that in fact a closer maych fopr the HA gene is a kansas strain: (A/Swine/Indiana/P12439/00 (H1N2) but given the dates on all of the isolates, all that really can be said is that the HA gene seems to have come from pig viruses already circulating in the USA.

Posted by: Ryan | April 29, 2009 10:28 PM

2

I think you meant a BLAST search. And Indiana.

But the real problem here is sampling bias. If we sample 200 swine a year from the USA, and 100 from Asia, and ZERO from central America or South America, of course the "best hit" in a BLAST search will be a USA or Asian swine.

What you are looking for, is a swine (or bird, or human) sequence that is 98% to 100% identical to the California 2009 human sequences. 92% to 95% identity indicates a rather distantly related strain.

Sampling bias is our biggest problem here. We have not been sampling hundreds of swine each year in central and south America for the past 20 years. Nor Sand Hill Cranes and other wild migratory birds.

Posted by: Brian Foley | April 30, 2009 12:41 AM

3

Hi Brian, No i meant a FASTA search which is more sensitive than a BLAST search.. try the, both and you will see. And what you say is obvious, you need a sequence in order to get any kind of aligment.. but the practical issue is we only have the sequences which have been sequenced. And of course the identity would ideally be as close to 100 as possible, but out of the 1000's of sequences in the database, the A/Swine/Indiana/P12439/00 (H1N2) HA sequence gives the closest match.. so i dont understand why ohio has been labelled as the closest match (well i do as the sample only consisted of HA sequences from 2005 onewards.

Posted by: Ryan | April 30, 2009 1:59 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter
Visit the Collective Imagination blog
Advertisement
Enter to win

© 2006-2009 Seed Media Group LLC. ScienceBlogs is a registered trademark of Seed Media Group. All rights reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | SEEDMAGAZINE.COM