I Don't Think the German Outbreak E. coli Strain Is Novel: Something Very Similar Was Isolated Ten Years Ago...

...in Europe. I'll get to that in a moment. You've probably heard of the E. coli outbreak sweeping through Germany and now other European countries that has caused over one thousand cases of hemolytic uremic syndrome ('HUS'). What's odd is that the initial reports are calling this a novel hybrid or some new strain of E. coli.

BGI has done some sequencing using Ion Torrent of one of these isolates, and Nick Loman assembled the data. Without getting too technical, the genome is actually in about 3,000 pieces, but with those data (and thanks to Nick for assembling them and releasing them) I was able to perform multilocus sequencing typing ('MLST'). Basically, we look at the partial sequences of several genes (in this case, seven) to identify its sequence type--think of it as a molecular barcode (for the scheme and details, see here).

So what did I find?

This EHEC strain is most likely a very close relative of ST678 (details in a bit). In fact, according to the mlst.net strain database, there is a strain "Jan-91", isolated in 2001* from Europe (no further geographic information is provided). That strain belongs to phylogroup D, and is associated with HUS...just like the outbreak strain. And the older strain also has the exact same serotype as the outbreak strain, O104:H4.

Now, the outbreak strain sequence isn't identical (the data are at the end of this post). "Jan-91" has an allele profile of adk6-fumC6-gyrB5-icd136-mdh9-purA7-recA7 (to orient you, adk is the gene, 6 is the particular variant of adk or allele). There are three differences:

1) In the outbreak strain, adk is a novel allele that differs from adk6 by one point mutation at position 30 (if I counted correctly; it's late as I write this...)

2) In the outbreak strain, the icd allele matches icd136 exactly; however, the genome sequence lacks the last two bases. Given that the genome assembly is in over 3,000 pieces ('contigs'), I think this is missing data, not biology.

3) In the outbreak strain, the recA allele differs from recA7 by one insertion. "Jan 91" has a sequence of AAAA, while the outbreak strain has a sequence of "AAAAA" (below, it's recorded as "aAAAA" to indicate the difference). With Ion Torrent (and other high throughput sequencing technologies), when you have 'runs' of the same nucleotide, such as "AAAA", it's not unusual for a base to be added or deleted, which could yield a 'false' "AAAAA." This could be sequencing error, but I can't rule out a real insertion (i.e., an extra A that's real).

While this is obviously a very preliminary analysis of a very preliminary assembly, I don't understand why this strain is being called 'new', 'mutant', or anything else. It's not a bolt from the blue: it looks like a nearly identical strain that caused HUS a decade ago in Europe. I would add the obvious qualifier that there very well could be massive gene gain and loss (I haven't looked at that yet). I'm guessing that the reports of this strain being very different were based on comparisons to the genomes of other HUS strains, which are pretty divergent. But we have seen this MLST type before associated with this serotype and this MLST sequence type disease syndrome.

All that being said, this is a very serious outbreak--I don't mean to downplay the seriousness of this as a public health and agricultural crisis by raising this issue. And it will be very interesting to see how different this strain is from other HUS strains. If we're lucky, the "Jan-91" E. coli strain still exists in someone's freezer, and we can see how it's evolved over the last decade. It's especially disconcerting that this strain is resistant to so many antibiotics.

An aside: Many kudos to BGI for publicly releasing the data.

Update: There's a new assembly using a different method. I haven't checked that yet.

Update II: Others are doubting that this is a novel strain:

Quoting scientists at the University of Münster, the institute rebutted earlier reports that the newest strain of E. coli had never been previously identified, calling it a "hybrid clone" that drew together the virulent properties of other strains. "Reports that this is a completely new type of pathogen are not accurate," the institute said.

Update III: Dr. Helge Karch released a statement confirming the ST678 typing.

Update IV: The second outbreak isolate genome sequence has the identical MLST sequence (ST678).

*2001 might be the sequencing date, not the isolation date. I can't tell.

MLST data (on some browsers, this is getting cut off, so you can download it here):

>german_adk6_diffatbase30
GGGGAAAGGGACTCAGGCTCAGTTCATCACGGAGAAATATGGTATTCCGCAAATCTCCACTGGCGATATGCTGCGTGCTGCGGTCAAATCTGGCTCCGAGCTGGGTAAACAAGCAAAAGACATTATGGATGCTGGCAAACTGGTCACCGACGAACTGGTGATCGCGCTGGTTAAAGAGCGCATTGCTCAGGAAGACTGCCGTAATGGTTTCCTGTTGGACGGCTTCCCGCGTACCATTCCGCAGGCAGACGCGATGAAAGAAGCGGGCATCAATGTTGATTACGTTCTGGAATTCGACGTACCGGACGAACTGATTGTTGATCGTATCGTAGGCCGCCGCGTTCATGCGCCGTCTGGTCGTGTTTATCACGTTAAATTCAATCCGCCGAAAGTAGAAGGCAAAGACGACGTTACCGGTGAAGAACTGACTACCCGTAAAGACGATCAGGAAGAAACCGTGCGTAAACGTCTGGTTGAATACCATCAGATGACTGCACCGCTGATCGGCTACTACTCCAAAGAAGCGGAAGCGGGTA
>german_fumC6
CGAGCGCCATTCGGCAGGCGGCGGATGAAGTACTGGCAGGACAGCATGACGACGAATTCCCGCTGGCTATCTGGCAGACCGGCTCCGGCACGCAAAGTAACATGAACATGAACGAAGTGCTGGCTAACCGGGCCAGTGAATTACTCGGCGGCGTGCGCGGGATGGAACGTAAAGTTCACCCTAACGACGACGTGAACAAAAGCCAAAGTTCCAACGATGTCTTTCCGACGGCGATGCACGTTGCGGCGCTGCTGGCGCTGCGCAAGCAACTCATTCCGCAGCTTAAAACCCTGACACAGACGCTGAGTGAAAAATCGCGTGCATTTGCCGATATCGTCAAAATCGGTCGAACCCACTTGCAGGACGCCACGCCGCTAACACTAGGGCAGGAGATTTCCGGCTGGGTAGCGATGCTCGAGCATAATCTCAAACATATCGAATACAGCCTGCCTCACGTAGCGGAACTGGC
>german_gyrB5
GGTCTGCACGGCGTTGGTGTTTCGGTAGTAAACGCCCTGTCGCAAAAACTGGAGCTGGTTATCCAGCGCGAGGGTAAAATTCACCGTCAGATCTACGAACACGGTGTACCGCAGGCTCCGCTGGCGGTTACCGGCGAGACTGAAAAAACCGGCACCATGGTGCGTTTCTGGCCCAGCCTCGAAACCTTCACCAATGTGACCGAGTTCGAATATGAAATTCTGGCGAAACGTCTGCGTGAGTTGTCGTTCCTCAACTCCGGCGTTTCCATTCGTCTGCGCGACAAGCGCGACGGCAAAGAAGACCACTTCCACTATGAAGGCGGCATCAAGGCGTTCGTTGAATATCTGAACAAGAACAAAACGCCGATCCACCCGAATATCTTCTACTTCTCCACTGAAAAAGACGGTATTGGCGTCGAAGTGGCGTTGCAGTGGAACGATGGCTTCCAGGAAAACATCT
>german_icd136_missingCCatend
CGACGCTGCAGTCGAGAAAGCCTATAAAGGCGAGCGTAAAATCTCCTGGATGGAAATTTACACCGGTGAAAAATCCACACAGGTTTATGGTCAGGATGTCTGGCTGCCTGCTGAAACCCTTGATCTGATTCGTGAATATCGCGTTGCCATTAAAGGTCCGCTGACCACTCCTGTTGGTGGCGGTATTCGCTCTCTGAACGTTGCCCTGCGCCAGGAACTGGATCTCTACATCTGCCTGCGTCCGGTACGTTACTATCAAGGCACTCCAAGCCCGGTTAAACACCCTGAACTGACCGATATGGTTATCTTCCGTGAAAACTCGGAAGACATTTATGCGGGTATCGAATGGAAAGCAGACTCTGCCGACGCCGAGAAAGTGATTAAATTCCTGCGTGAAGAGATGGGCGTGAAGAAAATTCGCTTCCCGGAACATTGCGGTATCGGTATCAAGCCGTGTTCTGAAGAAGGCAGCAAACGTCTGGTCCGTGCCGCGATTGAATACGCAATTGCCAACGA--
>german_mdh9
GGTGTAGCGCGTAAACCGGGTATGGATCGTTCCGACCTGTTTAACGTTAACGCCGGCATCGTGAAAAACCTGGTACAGCAAGTTGCGAAAACCTGCCCGAAAGCGTGCATTGGTATTATCACTAACCCGGTTAACACCACAGTTGCGATTGCTGCTGAAGTGCTGAAAAAAGCCGGTGTTTATGACAAAAACAAACTGTTCGGCGTTACCACGCTGGATATCATTCGTTCCAACACTTTTGTTGCGGAACTGAAAGGCAAACAGCCAGGCGAAGTTGAAGTGCCGGTTATTGGCGGTCACTCTGGTGTTACCATTCTGCCGCTGCTGTCACAGGTTCCTGGCGTTAGTTTTACCGAGCAGGAAGTGGCTGATCTGACCAAACGTATCCAGAACGCAGGTACTGAAGTGGTTGAAGCGAAAGCCGGTGGTGGGTCTGCAACCCTGTCTATGGG
>german_purA7
ATAACGCGCGTGAGAAAGCGCGTGGCGCGAAAGCGATCGGCACCACCGGTCGTGGTATCGGGCCTGCTTATGAAGATAAAGTAGCACGTCGCGGTCTGCGTGTTGGCGACCTTTTCGACAAAGAAACCTTCGCTGAAAAACTGAAAGAAGTGATGGAATATCACAACTTCCAGTTGGTTAACTACTACAAAGCTGAAGCGGTTGATTACCAGAAAGTTCTGGATGATACGATGGCTGTTGCCGACATCCTGACTTCTATGGTGGTTGACGTTTCTGACCTGCTCGACCAGGCGCGTCAGCGTGGCGATTTCGTCATGTTTGAAGGTGCGCAGGGTACGCTGCTGGATATCGACCACGGTACTTATCCGTACGTAACTTCTTCCAACACCACTGCTGGTGGCGTGGCGACCGGTTCCGGCCTGGGCCCGCGTTATGTTGATTATGTTCTGGGTATCCTCAAAGCTTACTCAACTCGTGT
>german_recA7withinsertion_aAAAA
CGCACGTAAACTGGGCGTCGATATCGATAACCTGCTGTGCTCCCAGCCGGACACCGGCGAGCAGGCACTGGAAATCTGTGACGCCCTGGCGCGTTCTGGCGCAGTAGACGTTATCGTCGTTGACTCCGTGGCGGCACTGACGCCGAAAGCGGAAATCGAAGGCGAAATCGGCGACTCTCACATGGGCCTTGCGGCACGTATGATGAGCCAGGCGATGCGTAAGCTGGCGGGTAACCTGAAGCAGTCCAACACGCTGCTGATCTTCATCAACCAGATCCGTATGaAAAATTGGTGTGATGTTCGGTAACCCGGAAACCACTACCGGTGGTAACGCGCTGAAATTCTACGCCTCTGTTCGTCTCGACATCCGTCGTATCGGCGCGGTGAAAGAGGGCGAAAACGTGGTGGGTAGCGAAACCCGCGTGAAAGTGGTGAAGAACAAAATCGCTGCACCGTTTAAACAGGCTGAATTTCAGATCCTCTACGGCGAAGGTATCAACTTCTACGGCGA

*2001 might be the year it was sequenced. I can't tell from the database.

More like this

That ties in with some vague comments coming out of the public health people in Belfast, Northern Ireland

...The strain of VTEC infection suspected in this outbreak is O104 which is a rare strain of the infection which is seldom seen in the UK. In England there have been two cases in German nationals....

"Seldom seen" hardly refers to the current outbreak, so they have seen it in the past.

Us non-scientists have our uses!

Well, over here at the meeting, there has been some discussion. Some people even appeared on the BBC to comment last night (much fun at the Red Lion). Seeing as this was a conference on genomics and public health, at the Sanger, it was very apropos.

Anyway, I think Mike got it right when he said that the MLST didn't handle the potentially interesting issues - gene gain/loss. I'm agnostic on whether this is new or not; I've only been hearing things second hand. However, the backbone of the genome is not the place to look. Especially when we have low coverage genomes. Maybe try something like mummer against the references? The fun is likely to be in mobile elements and the accessory genome.

I thought the 'mutant' description was because there are additional virulence and/or toxin genes that haven't been found in this serotype before which I assume wouldn't show up on an MLST analysis as that is looking at housekeepers. Not that I have been able to find out exactly what has changed - just quotes from people saying there are differences.

(I believe the WHO is saying that it is a 'variant' that has never caused an outbreak before, rather than a new serotype. And I think it was in der Spiegel it said that this strain had been seen in a single case before.)

I may be totally wrong - microbiology is still a fairly new field to me - and am quite happy to be corrected.

Take these comments with a grain of salt also, but it seems there's a big difference between "novel hybrid" and "new strain". The title of your post makes it seem like you think it's neither.

The block quote from Munster, however seem to still be pushing the hybrid angle. The strain is positive for all 5 enteroaggressive plasmids combined with shigatoxin ( http://www.rki.de/cln_178/nn_217400/EN/Home/EHEC__O104__H4,templateId=r… )

Perhaps it's this combination that hasn't been seen before?

I've also heard the suggestion that increased HUS rates in this outbreak (~30 percent) compared to normal (~10 percent) point toward something new. It seems to me, though, that a likelier explanation is that we have an artificially low overall case count that misses mild, self-limiting illness which never sends people to the doctor. Or maybe German doctors inadvertently administered antibiotics, which is a no-no for STEC infections.

Awesome! I have almost no knowledge of the biological sciences but even the little I can gleam from this is fascinating. Thanks.

Rob Tauxe from CDC has also gone on record as saying that he doesn't think this is a new stain either. Thanks for the nice bit of work!

By Don Schaffner (not verified) on 03 Jun 2011 #permalink

New findings on the E. coli O104 that is causing deaths in Europe
New York. USA. A private biotechnology company used their DNA scanning algorithms to determine that E. coli O104 has genomic signatures specific to Stx2 converting phage I and Stx2 converting phage II previously found in strains of the outbreak in Sakai city, Japan, in 1996. These genomic signatures are absent in the Central African E coli EAEC 55989.

I don't know how cucumbers could be the original culprit. The cukes I see in the pictures from Europe show that part of the stem is still on (unlike cukes in U.S. supermarkets). Also, one of my brothers harvested cucumbers for a summer and said that they are so prickly that if you don't wear gloves, your hands will get torn up.

In the tomato crisis in the U.S., it was determined that tomatoes on the vine were OK. I ate them without a problem.

Why isn't this also true of cucumbers?

Yes, the gloves can be contaminated, but with stems and gloves, how can this crisis be so serious?

By Martha Turner (not verified) on 08 Jun 2011 #permalink

The real question here should be the identification of the natural host of this e.coli strain, which is of course a mammalian gut bacteria.

If it got onto vegetables or sprouts, it must have been through the use of manure contaminated with this strain. That is, there must be populations of livestock in Europe that are hosting this strain, perhaps without any problems (a virulent strain in one species may be benign in another, depending).

A similar case (E coli O157:H7) occurred in the U.S. in 1993, linked to Jack in the Box suppliers and poor food handling practices:

"The ground beef had been distributed to Jack in the Box stores in Washington, Idaho, California, and Nevada, and by the end of February 1993 the states had reported the following:

* Washington - 602 patients with bloody diarrhea or HUS. 477 culture-confirmed E. coli infections, 144 people hospitalizations; 30 HUS cases, and 3 deaths.
* Idaho - 14 culture-confirmed E. coli O157:H7 cases , 4 hospitalizations, 1 HUS case.
* California â 34 patients with bloody diarrhea, 6 culture-confirmed cases, 14 hospitalizations, 7 HUS cases, and 1 death.
* Nevada â 58 patients with bloody diarrhea, 1 culture-confirmed E. coli case, 9 hospitalizations, and 3 HUS cases.

"Seventy-three Jack in the Box restaurants were ultimately identified as having received recalled meat, and were part of the E. coli outbreak."

By ike solem (not verified) on 10 Jun 2011 #permalink

P.S. What about the antibiotic resistance package that this strain contains? There's been very little about that - but typically such antibiotic resistance kits are contained on plasmids taken up by E.coli from other bacterial strains. They are closely associated with the use of antibiotics to enhance growth and prevent disease in crowded unsanitary factory farms.

Germany's meat industry has been taking a beating lately (dioxin feed contamination, etc.) and it seems that the German government is bent on protecting that industry - probably the main reason they pointed to Spanish cucumbers in the first place.

By ike solem (not verified) on 10 Jun 2011 #permalink

Man!!! It's all a scam designed to take more money from the sheeple, they've pulled this crap so many times now it's no longer even scary...they've overplayed their hand, their bluff is exposed.

Of course the strain is not completely new. The core-genome (on which mlst analysis usually is applied) is quit stable and is not involved in host specificity, aggressiveness or pathogenicity. It is very likely that a few virulence factors (on plasmids or genomic islands) were acquired by this strain through horizontal gene transfer from other E. coli or related (pathogenic) enterobacteria. I am not surprised that these strains show mlst-types which are known already for a long time. Anyway, It is very clear that "new" highly pathogenic variants of a single species may arise easily as a result of genome flexibility, recombination and/or DNA uptake by (pathogenic) species

By Patricia Stevens (not verified) on 07 Jul 2011 #permalink