Did the California H1N1 swine flu come from Ohio?

This afternoon, I was working on educational activities and suddenly realized that the H1N1 strain that caused the California outbreak might be the same strain that caused an outbreak in 2007 at an Ohio country fair.

UPDATE: I'm not so certain anymore that the strains are the same. I'm doing some work with nucleic acid sequences to look further at similarity.

Here's the data.


Once I realized that the genome sequences from the H1N1 swine flu were in the NCBI's virus genome resources database, I had to take a look.

And, like eating potato chips, making phylogenetic trees is a little bit addictive. Or maybe it was just the adrenaline rush that hit when I realized that every tree was telling me the same thing.

What did those trees say?

They all said the California swine virus is most closely related to a swine flu virus from Ohio and very different from other H1N1 viruses that have infected humans. In fact, in some cases, it seems like the H1N1 virus is very similar to a virus that caused an outbreak in 2007 at an Ohio country fair (1).

Okay, let's look at the data. I used H1N1 (and a couple of H1N2) protein sequences from Swine and Humans between Jan 1 2006 and today. This gave between 324 and 600 sequences for each tree and every case, the California sequences clustered with those from the Ohio pigs.

And now, the data.

1. PB1

View larger image

2. PB2

View larger image

3. NA

View larger image

4. PA

View larger image

5. NP

View larger image

6. M1

View larger image

7. M2

View larger image

8. HA

View larger image
View larger image

For every single protein sequence tested, the 4 California isolates clustered with the sequence from the country fair pig from Ohio.

In a few cases, one other Ohio pig isolate (2) was part of the cluster. The sum of the phylogenetic analyses are compelling and support the hypothesis that the California H1N1 swine flu virus may have come from Ohio.


  1. Vincent, A., Swenson, S., Lager, K., Gauger, P., Loiacono, C., & Zhang, Y. (2009). Characterization of an influenza A virus isolated from pigs during an outbreak of respiratory disease in swine and people during a county fair in the United States Veterinary Microbiology DOI: 10.1016/j.vetmic.2009.01.003
  2. Yassine,H.M., Zhang,Y.J., Lee,C.W., Byrum,B.A. and Saif,Y.M. 2008. Genetic Characterization of Triple Reassortant H1N1 Influenza Viruses from Pigs in Ohio, unpublished. Influenza A virus (A/swine/Ohio/24366/2007(H1N1))

More like this

Wow. Nice work.

How did it get to Mexico, or is there some complex thing going on here with the pig industry?

How come we are led to believe that the California cases, as well as Mexico and other countries have a bird flu component and human-to-human component in addition to the swine flu as noted above? How does that get explained as a new strain of the virus?

How could it have existed during the interim period undetected for so long?

I don't see it.

I just used blast on the non-redundant nucleotide GenBank set to grab the top 1000 most-similar PB2s to the California isolate (GI 227809824), ran a quick-and-dirty clustalw, and looked at the phylogeny from that. I realize that's not the most accurate approach but it's not showing any particular clustering with the Ohio strain. A chunk of it is here (http://www.iayork.com/Temp/FluPB2.jpg). Nearest neighbours are another California isolate gi 227831783 , then the closest cluster includes A/duck/NC/91347/01(H1N2), A/mallard duck/South Dakota/Sg-00125/2007(H3N2), A/pintail duck/South Dakota/Sg-00126/2007(H3N2), and A/swine/Korea/CAS05/2004(H3N2).

How did you select your sequences? I think you have to use blast to make sure you start with the most similar sequences, you can't restrict yourself to a specific subset (i.e. swine and humans).

Mind you, this is just the PB2, I haven't tried the other proteins yet, I'll try them tomorrow morning. But some very quick checks also don't pull up the Ohio isolate as best matches for the California isolates.

I'll post the instructions for doing this tomorrow. Come back about 11 am PST.

Hi Sandra,
Could you please post your sequence alignment? (Or, email it to me). I'd like to repeat your analysis, but using likelihood-based methods instead of neighbor-joining methods. My plan is to use the Akaike Information Criterion to select a best-fitting evolutionary model, and then using maximum likelihood methods to build a phylogeny. I'd also like to run a Monte Carlo Markov Chain to find the Bayesian posterior probability support value for each clade. I'm very interested if your Ohio-California hypothesis is robust to more rigorous phylogenetic methods. If you share the original sequences, I'll gladly share my results with you. Thanks!

How about submitting the data to the CDC's Mortality and Morbidity review?
If you bundled this info up and sent it off to the CDC I'm sure they would appreciate it.

By John Gray (not verified) on 28 Apr 2009 #permalink

FWIW, I'd support iayork's suggestion to blast the sequences to find the most similar sequences - the branchlengths on those 2007/2009 terminals are worrysome. Honestly, they look like recombinants within a non-recombinant set (or simply a set of ingroup taxa that aren't that closely related). Having worked with recombination, it's a bear! One suggestion to all analyzing this data would be to make sure you know the breakpoints where recombination has occurred (I used RDP years ago - http://darwin.uvigo.es/rdp/rdp.html - I don't know what's up to date now). Run each recombinant segment with its ancestors ONLY, or you will be in for a world of trees that either don't make sense, or are subversively inaccurate. Thanks, Sandy, for working on this!

By Gwen Aimes (not verified) on 28 Apr 2009 #permalink


I agree with both Victor Hanson-Smith and Gwen Aimes. This is great work but I would feel more confident with ML trees and checking for recombination--I would use LDHat.

What the Mad Biologist said. The multiple proteins make this look good, but I've always been slightly uncomfortable relying on neighbor methods when other possibilities are available.

The Mexico/California human viruses are almost certainly not derived from the OHIO swine H1:N1. I just BLASTed the OHIO swine/human H1:N1 HA gene, and the California swine/human H1:N1 HA gene. Yes, there are hundreds of other H1 HA gene sequences in the database, from birds, humans and swine. The California HA and Ohio HA are not particularly related to one another.

Worldwide, there have been hundreds of H1:N1 and H1:N2 swine flu viruses isolated and sequenced in the past 20 years or so. I don't think the OHIO one is significant here.

I got pretty much the same results with the Polymerase gene:
PB1 FJ966080 (http://www.ncbi.nlm.nih.gov/nuccore/227809825?report=fasta&log$=seqview ) it is not significantly related to the OHIO virus, or any other virus in the database, as far as I can tell. There are millions of swine and birds in the world that carry influenza A viruses, and we sample a few hundred each year. More sampling will be needed to trace this one.

Thanks for the advice,

I'll BLAST the sequences and post some multiple alignments. I haven't run any of the Phylip programs since I got an Intel Mac, so I may need some help with doing maximum likelihood. I'll report back.

With branch lengths like those, it seems you're going to need a larger taxon sample to be able to conclude much, as well a reconstruction method less prone to long branch artifacts.

That said, this is certainly the type of approach we should be taking to track the origin of the virus.

What method did you use for building those trees?

I posted the methods I used for querying the NCBI viral resource database here. Once you get a collection of sequences, there's a button you can click to download a multi-sequence FASTA file for making trees, etc.

I think it would great to have everyone who is interested in this looking at the sequences.

Watching this post & subsequent commentary is awesome. Hypothesis, initial testing & confirmation, subsequent testing by alternate means and other scientists, non-confirmatory results, polite reasonable discussion of methods, results, and new evidence . . .

Many thanks to all of you, for letting this unfold where "the rest of us" can watch.

Sandy, this is a great exercise and very educational for the public - I'm glad to see you doing it. It also shows the power and value of publicly releasing data without delay.

However, I think your trees are not really accurate (as some other posters have pointed out). Your CA and Ohio samples all appear as outgroups to the main tree, which is a big warning sign that something is missing. We've built trees and found that the CA sequences cluster much more closely with dozens (or hundreds) of other isolates, mostly from swine. The other posters are right that you have to make sure your initial alignment contains all the nearest neighbors (which you can get using BLAST) before launching the tree-building tool.

But I agree, everyone should dive in and see what they can figure out.

Perhaps, would it be prudent to post something in the original post about how the original hypothesis has been questioned? If one doesn't read the comments, one would still suppose that the conclusion that the California and Ohio strains are similar is true, when this seems not to be the emerging concensus, at least, at the moment.

good idea, Kalib. Especially since I have some other things to complete today before I can get back to looking at sequences.

Thank you Sandra for these articles. I followed article 4 on making trees and followed your seq. for the Ohio (H1N1)but included Minn. swine (H3N2)within your date range, with the Human, Calif and other state, seq for this year. The software had the Minn. seq. branching to the Ohio and Human seqs. I am new to this software and have just a Chemical Engineering and computer programming backgroud so I don't know if it is even valid to include a H3N2 seq in the tree build with the H1N1 anaysis you did. I just have an interest in genetic information processing and try to find analogies and approaches with my computer programming background.

By Eric Everett (not verified) on 29 Apr 2009 #permalink

Thanks Eric,

It's fine to include H3N2 in the tree. It should cluster away from other samples and provide a kind of reference point. People often do include less-related sequences when making trees. These are referred to as "out groups" are helpful for rooting the tree.

NJ is a terrible method in general with many known flaws (check Systematic Biology for how often it's used these days...never). ML, Bayesian or parsimony methods would be a much better approach.

By Anonymous (not verified) on 29 Apr 2009 #permalink

How has the general consesus affirmed that PB2 is an "avian" gene (in this outbreak) when it has alot of homology with 'Flu isolated from pigs in korea??


I'm getting a status report on following the link, rather than a blog...

By BioinfoTools (not verified) on 30 Apr 2009 #permalink

The article is nice. I like the trees, but I can't read anything. I'll post my BLAST searches soon.

@36: Thanks.

By BioinfoTools (not verified) on 30 Apr 2009 #permalink

It's really exciting to see the process of science unfold, here, complete with all bumps along the way. Definitely some of the best information I've come across on Swine Flu has come through these pages and the community you're attracting, Sandra. And you've encouraged me to sharpen up my bioinformatics chops. The cool thing about bioinformatics is that almost anyone can do it with tools that are accessible via the web.

Thank you for posting such interesting work. I am not a scientist and yet everyone here, and especially you Ms. Porter, write in such a way that I can follow the conversation. I just wanted to say thank you. Back to lurking...

I agree with comment #42, the interview has some good information and does help explain some of the findings.

For those wondering about the Mexico samples: the GISAID database DOES have some isolates from Mexico. ALL available sequences from the current outbreak are very similar, and about half of the few nucleotide differences we do see are silent (that is, have the same protein translation). Access to GISAID is free of charge, but one must accept some legalese and register for an account. At the moment the GISAID folks are utterly swamped with requests for accounts so they may take longer than usual to create accounts. They say they are giving priority to applications from people with email addresses at research institutions.

I don't know why the Mexico isolates have yet to appear in GenBank, but I have analyzed sequences including them and they don't appear to be anything special. In a phylogenetic tree of all HA sequences from the current outbreak, some of the Mexico samples fall next to Texas samples and other Mexico samples fall between samples from New Zealand and the Netherlands.

Various CDC people have said to interviewers that other Mexican samples yet to be released in any database are also highly similar to the ones we have already seen.

By Anonymous (not verified) on 02 May 2009 #permalink

Thanks Anon.

That's interesting. I wonder, too, why GISAID is keeping the sequences out of GenBank. Could there be IP issues?

I was very ill this winter from December- March. I waited until February to go to my GP after flu-like symptoms/pain in joints, fever, sore throat, cough, congestion, etc became unbearable and did not clear on there own. I was tested for a lot of things, and was diagnosed with Human Parvo B19 Virus. It is May and I am still am dragging, and am told the Parvo would go away on it's own. I have been to many doctors since with no proper diagnosis. I've had to give up my business, a small hair salon- because I could no longer work and was worried about breathing/coughing on clients. My life has been turned upside down.
I researched on my own about viruses myself before the Swine flu virus was a subject of concern, since my illness seems to have no end or cure and I can't seem to get any answers. I live in South Orange County , Ca. 1 hour from Mexico, and I am in the border population as were my clients - some in Imperial county. My question is: Do you have any idea what the symptoms are for the other mystery california strain of swine flu? Any information that my state/gov. might not be telling me? Could I possibly have any earlier version of the swine flu before it mutated and headed south? Any viruses mistakenly show Parvo B19 results, but not really be B19 but something else?
Need to know - I think it is very strange - if the swine flu DID in fact start in Mexico - why is California not bombarded with cases unless it truly did start here and we already have been infected and have some anti-viral antibodies against the new strain? Thank you for your research. Any response is appreciated. Concerned Caiifornian

By Concerned Cali… (not verified) on 04 May 2009 #permalink

Dear Concerned,

It sounds like you're having a rough time. I'll answer any questions I can, but I'm not a doctor and I can't give any kind of medical advice. If you think your doctor may have missed something, it would be best to get a second opinion from another doctor.

Okay - so on to your questions:

1. What are the symptoms?

I think the symptoms are pretty much the same for both kinds of flu.

According to the CDC (http://www.cdc.gov/flu/symptoms.htm):

Influenza usually starts suddenly and may include the following symptoms:

  • Fever (usually high)
  • Headache
  • Tiredness (can be extreme)
  • Cough
  • Sore throat
  • Runny or stuffy nose
  • Body aches
  • Diarrhea and vomiting (more common among children than adults)

Having these symptoms does not always mean that you have the flu. Many different illnesses, including the common cold, can have similar symptoms.

2. Could you have been infected with an earlier version of the swine flu before it mutated and headed south?

I don't know. I think antibody tests could determine if you had been exposed to swine flu, but their presence wouldn't distinguish between having had the disease and getting exposed to it.

3. Are there viruses that show Parvo B19 results, but are B19 something else?

I don't know. It depends on the test and I don't what test was used or anything about the specificity of the test.

Thanks 41, but i really do think calling parts "avian" is totally wrong.. as most, if not all, of the 8 segments are probably of avian origin in the past. I think many people are thinking a bird strain has crossed with a human one and a pig one in the past month or so.. which is not what people seem to be saying :

If and when you have time, could you (or someone else that knows) explain how after diagramming the origin of each gene, you determine the sequence of reassortment, especially if multiple reassortment events preceded the emergence of a strain? I understand how one would trace the lineage of offspring of 2 parent genomes contributing to a resulting genome, and think one would just use the same process, but with billions of siblings? in each generation, and 8 parent gene segments, can the strains be clearly traced or do you end up with much scientific uncertainty and opinions?

By Skeptigal (not verified) on 05 May 2009 #permalink


The one thing not likely to be helpful, is guessing other diagnoses because they sound like what you have. Parvo can cause chronic arthritis in rare cases. Fever is not common with it however.

Influenza of any strain is not likely to be the cause of chronic symptoms.

Record your temperature with a thermometer, don't guess. Take it several times a day over a couple of days. Keep track of your symptoms during that time with accurate as possible notes. Include times and as much information as possible that might be relevant.

Then take your log of symptoms in to the health care provider you trust the most. Don't keep going to different providers. Ask for a consultation with an infectious disease doctor if the provider has no new ideas. You may also need to see a rheumatologist.

You need to allow one provider to follow through with additional tests rather than starting over with a new one again and again. You need to provide the most accurate description of your symptoms and a temperature and symptom diary is the best way to do that.

You need to consider that depression can sometimes cause physical symptoms. Sometimes it takes time to find the cause of an illness. When the same symptoms are common in multiple illnesses, it can take time to find which one it is.

By Skeptigal (not verified) on 05 May 2009 #permalink

I can do a bit more to explain how people find reassortment. It's based more on sequence similarity than on opinion.

I completely agree, I like these alot.

And, they're using Neighbor Joining trees.

has someone else looked at
it is almost identical to the average of the other sequences
(except 2 mutations) , looks like the original reassortment
But it is human, must be from Apr.2009 (no date given),
so did the virus survive in the environment without replicating ?

Hi Ida,

I think those sequences would make a good outgroup, I've just busy with end-of-the-quarter grading responsibilities. I'll come back to this soon.

Here is an excellent site where viral phylogenetics experts are posting and discussing their analyses:
Posted by: Anonymous | May 6, 2009 9:39 PM

Thanks. That will do for now to expand my current knowledge on the genetic analysis. So many specialties in science, so little time.

By Skeptigal (not verified) on 11 May 2009 #permalink

To Concerned Californian: Unless the Parvo Antibody was an IgM Antibody (not an IgG Antibody), its presence just demonstrated that you had, at some time in your life, been infected with the Parvovirus. IgM means recent or current infection. Prolonged cough or respiratory symptoms is not usually a symptom of this virus, although joint symptoms can last for a number of months (for example, pain and swelling of joints of both hands). Your physician does need to search for other explanations for your symptoms.

By Mellowdoc (not verified) on 31 May 2009 #permalink

Normally, inflammation is the way the body responds to an injury or to the presence of disease agents, such as viruses or bacteria. During this reaction, many cells of the body's defense system (called the immune system) rush to the injured area to wipe out the cause of the problem, clean up damaged cells and repair tissues that have been hurt. Once the "battle" is won, the inflammation normally goes away and the area becomes healthy again.


Glaxo is just a marketing hand
so who sold that vacc to the whole world carrying seeds of the next pandemic?
What state, what monster?
When failed, getting away to try again?
Like Oklahoma

your comment must be approved by


guess where all the internet monitoring flows to and you guessed where that vaccine maker sits

According to a list compiled by Dr. Patricia Doyle at rense.com, a host of strange ingredients are used to make up Hoffman-La Roche's anti-flu drug Tamiflu, which has recently been connected with bizarre behavior,

Patients using Tamiflu -- which many nations are stocking up on as a way to combat a possible pandemic of the deadly H5N1 bird flu -- reported delirium, hallucinations, delusions, convulsions, disturbed consciousness and abnormal behavior. The FDA reports that side effects reported with Tamiflu include nausea, vomiting, diarrhea, bronchitis, stomach pain, dizziness and headache.


I agree Scary.Thanks.

It is interesting and scary that we might be producing our own swine flu in the U.S. My concern is the large production of swine in confined areas in general is leading to these problems in the first place.

These hogs are shot up with loads of antibiotics to try and keep a semblance of health. With our animals being raised in confined areas with just enough room to stand, get ready for more virus and bacterial outbreaks to sweep across our nation.

If one animal catches it and is processed at one of the few meat plants left in America, that could lead to contamination throughout the United States. Even when they do the recalls, it is to late. One of these times, it will be way to late.

By Houston Chiropractor (not verified) on 01 Mar 2011 #permalink