Update/clarification: I want to clarify something critical. This is not about picking on a researcher or a country. It very well could have happened in the U.S. or anywhere else. I, nor you the reader, have any idea about the internal constraints these groups experience, or what was communicated to government officials. To the extent that data sharing didn’t occur due to concerns over publication, this represents an instance where the publication process–and the import attributed to it–affected the need for rapid release. That’s the key point, not assigning blame to individuals or countries. Take the personal criticism and jingoism somewhere else.
Update II: See this comment. A draft assembly of the 2011 strain was released by the group mentioned in the Nature article. This is my error.
Yesterday, I discussed what I thought the implications of the rapid data release are for genomic epidemiology. I also promised a rant about the race to publication surrounding the O104:H4 E. coli outbreak in Germany–and who am I to disappoint? Before donning my ranty pants, however, it’s worth recognizing the importance of the rapid release of both raw data and genomic assemblies, first by BGI, and then by HPA. That public release, more than anything, provided useful and timely information to scientists about this outbreak.
OK, with that out of the way (for now), I got my ranty pants on. Let’s revisit Marian Turner’s Nature news article (italics mine):
The collaborative atmosphere that surrounded the public release of genome sequences in the early weeks of this year’s European Escherichia coli outbreak has turned into a race for peer-reviewed publication….
The LB226692 and 01-09591 genomes were sequenced using an Ion Torrent PGM sequencer from Life Technologies of Carlsbad, California…. The authors say that their publication is the first example of next-generation, whole-genome sequencing being used for real-time outbreak analysis. “This represents the birth of a new discipline — prospective genomics epidemiology,” says Harmsen. He predicts that this method will rapidly become routine public-health practice for outbreak surveillance.
But Harmsen’s group was pipped to the publishing post by Rolf Daniel and his colleagues at the University of Göttingen in Germany, who published a comparison of the sequence of two isolates from the outbreak with the 55989 strain in Archives of Microbiology on 28 June. Harmsen says that this competition is why his group did not release the 2001 strain sequence before today’s PLoS One publication.
Both groups say that their genomic sequencing and analysis were conducted independently. But their findings don’t really differ from sequence analyses that other scientists were simultaneously documenting in the public domain, following the release, on 2 June, by China’s BGI (formerly known as the Beijing Genomics Institute) of a full genome sequence of the outbreak strain — also generated using Ion Torrent sequencing. These scientists say that there is very little information in either publication that was not previously available on their website. “The crowd-sourcing efforts arrived at almost all of the scientific conclusions about the strain comparisons first,” says Mark Pallen from the University of Birmingham, UK, “so we’re surprised and disappointed that these findings are not referred to in these papers.”
Leaving aside the issues of priority and recognition, the critical thing is that these papers provided no understanding of the outbreak while it was happening. The early release of data (even before it reached the NIH/NCBI repository) by BGI and then HPA along with the assemblies did. If there are heroes in all of this, BGI and HPA are.
All of the analysis which helped us understand what this strain is happened weeks before publication. At this point, publication is just about keeping score.
(an aside: if groups are delaying publication because they’re not trying to provide a rapid public health response, but very high quality data, such as improved genome assemblies or SNP verification, to inform basic research related to the outbreak, that is different. But the above quote is clear that the delay wasn’t about data quality or even other issues, such as funders’ stipulations.)
To claim you’re first has as much to do with how rapidly journal editors and their staffs respond along with reviewers’ requests for changes as it does any scientific ability: most genome centers can bang out a couple of pretty good bacterial genomes very quickly, along with assemblies and annotations, if they’re so inclined. Comparing lists of genes and making some good figures isn’t that hard either–hell, bloggers did that. On their own spare time. In this particular instance, claiming your group is ‘first’ is as ridiculous as those “baby on board” car signs, which imply that the vehicle’s owners were the first people to invent screwing without using birth control. As the kids used to say, big whoop.
While publication is enshrined as the pinnacle of scientific communication (although that might be changing), in this case, it was pretty much irrelevant–and it appears to have slowed data release. Worse, the race to publication means that, rather than collaborating and standardizing the data analysis (e.g., having the same data processing), the larger scientific community will be analyzing slightly different genomes due to processing, unless someone wants to rework everything from the beginning (if that’s even possible). This is very helpful [/snark].
Finally, let’s look at this again:
But Harmsen’s group was pipped to the publishing post by Rolf Daniel and his colleagues at the University of Göttingen in Germany, who published a comparison of the sequence of two isolates from the outbreak with the 55989 strain in Archives of Microbiology on 28 June. Harmsen says that this competition is why his group did not release the 2001 strain sequence before today’s PLoS One publication.
Wow. I could get really nasty, but I’ll just speculate that if I were a German citizen and had read that, I would be unhappy. With forty people dead, and the possibility of a massive lawsuit from Spain, to worry about publication? To worry about coming in second? Jeepers. But the outbreak didn’t kill any of my countrymen, so I’ll leave any fury to the Germans. (Update: Unfortunately, I am wrong–it did kill one U.S. citizen)
(Of course, maybe one simply believes that one’s research doesn’t really matter. Or something.)
The point is not to call out any one person or group, but a system that didn’t work so well. Any system that encourages and fosters this behavior in the face of a public health crisis needs a serious rethink. We should be seriously reconsidering what publication means in the context of a rapidly moving health crisis–and what that tells us about our current system of scientific communication.