Data paparazzi.

In a comment on another post, Blatnoi asks for my take on a recent news item in Nature:

An Italian-led research group's closely held data have been outed by paparazzi physicists, who photographed conference slides and then used the data in their own publications.

For weeks, the physics community has been buzzing with the latest results on 'dark matter' from a European satellite mission known as PAMELA (Payload for Antimatter Matter Exploration and Light-nuclei Astrophysics). Team members have talked about their latest results at several recent conferences ... but beyond a quick flash of a slide, the collaboration has not shared the data. Many high-profile journals, including Nature, have strict rules about authors publicizing data before publication.

It now seems that some physicists have taken matters into their own hands. At least two papers recently appeared on the preprint server arXiv.org showing representations of PAMELA's latest findings (M. Cirelli et al. http://arxiv.org/abs/0808.3867; 2008, and L. Bergstrom et al. http://arxiv.org/abs/0808.3725; 2008). Both have recreated data from photos taken of a PAMELA presentation on 20 August at the Identification of Dark Matter conference in Stockholm, Sweden.

I'd say this is a situation that bears closer examination.

Here, we have a bunch of players with connected, and sometimes competing, interests.

We have the PAMELA research team, whose members have been hard at work collecting data and trying to figure out what these data mean. They're hoping this work will make a significant contribution to their scientific field. And, they're hoping to establish their priority in reporting these important results, because that's how scientists keep score (and because that score-keeping has an impact on how things like grant money, academic positions, and tenure are distributed).

We also have the larger community of physicists working on the question of what dark matter is like, how to get good empirical evidence on this matter, etc. This community has an interest in keeping up with the most recent findings by scientists working in this area; not only does this help them avoid duplicating someone else's work, but it also helps them make more sense of their own findings.

From the point of view of a scientific community jointly engaged in trying to answer a certain constellation of questions, scientific communication is a good thing. And the scientists from the PAMELA team did present their results to members of their scientific community at various conferences.

But its sounds like, rather than lingering over the details of their data, they flashed a slide to show that there was some data forming the basis for their more general claims.

Who knew that there would be digital cameras flashing as they were flashing their slide with the data?

Now, the Nature story notes that the two papers cited above that reconstruct the PAMELA data do clearly cite the source of these data. These authors are not pretending that they collected the data themselves (except to the extent that catching a digital photo of a conference slide constitutes data "collection"). And arguably, in discussing these data, they are moving the conversation about the PAMELA findings (and what these findings show about the nature of dark matter) forward.

This may be good for the community of physics, but it's not so good for the PAMELA team members who might have been hoping to publish their findings in Nature or in a journal with similar rules about keeping data confidential prior to publication. The Nature article suggests the PAMELA scientists may have had just such plans, which would now appear to be thwarted:

Piergiorgio Picozza, PAMELA's principal investigator and a physicist at the University of Rome Tor Vergata, says he is "very, very upset" by the data being incorporated into a publication. But [Marco] Cirelli [one of those who photgraphed the slide with the data] maintains that he and others have done nothing wrong. "We asked the PAMELA people [there], and they said it was not a problem," he says.

This is one of those moments where the interests of the individual scientist (in establishing priority for a finding) and of the scientific community (in full communication of important results in a timely fashion) seem to be in direct conflict.

Indeed, the conference presentations might have been motivated by perceived duties to one's scientific community, to keep the members of that community appraised of important results. From the point of view of the journal rules, is presenting data at a conference the same as "publicizing" data? Does "publicizing" data require that you deal with it in detail (rather than flashing it on a slide to prove you have it)? In case of priority disputes, would presenting data at a conference (either in detail or in passing) be enough to establish a scientist's claim to priority?

There are all sorts of related questions we might ask here connected to the nature of conference presentations as a species of scientific communication. For one, I'd be interested for the PAMELA scientists' take on what role the slide with the data played in the narrative structure of their presentation. Would the presentation have worked without such a slide? If not, why not? Will their future conference presentations omit such slides, and if so, what changes do they anticipate in how their findings are received?

But back to the tension between individual and communal interests. One of the recognized norms of science is that, once you've made your results public, they are community property, a shared resource upon which other members of your scientific community can draw. In theory, you want your results to be important enough that others will build on them. Of course, you also want to get the maximum recognition and reward for your contribution to the community's body of knowledge.

The nub of the matter here may be which modes of communication really "count" within the community of physics. If conference presentations count then, once the PAMELA results were communicated, they were fair game for other scientists to use. In contrast, if what counts is a peer reviewed article printed in a dead-tree journal like Nature, you could make a case that the PAMELA data wasn't really properly "out" to be drawn upon as a community resource.

And here, there may be a difference in opinion between physicists as a group (including those in the audience for the conference presentations in question) and the particular physicists trying to get their results published in a high impact journal like Nature. I've heard from a number of sources that much of the communication of important results within the community of physics happens by way of preprints uploaded to arXiv.org. One of the reasons I've heard for this is that the turn-around time between submission and publication in a high impact dead-tree journal is just too long, wasting precious time that the community could be using better to answer the questions with which it is wrestling. If arXiv.org is really the main conduit for communicating important results in physics, then there might be something almost anti-social about deciding to submit your findings to Nature instead. (I'm assuming here that Nature would regard a preprint on arXiv.org as a case of publicizing your data prior to publication.)

Of course, the reality is that a number of physicists may be working in contexts where they are not just answering to the expectations of their fellow physicists, but also to those of their employers, their granting agencies, the people evaluating their performance for tenure and promotion, etc. In some of these situations, an article in Nature may carry much more weight than the esteem of the tribe of physicists. Given the importance of such factors in keeping one's job and one's funding to do physics, it is understandable that members of the PAMELA team would feel wronged by the paparazzi who publicized their data.

So, the exigencies involved in maintaining the position from which to take part in physics research may make it a good idea to depart from community norms about how and where to share data and conclusions (and how completely, and how promptly, etc.). In the short term, they might also provide a good reason to forbid photography and videography during conference presentations.

However, it seems to me that this particular story also raises some bigger questions about the rules imposed on scientists by journals like Nature. Who is served by keeping the data under wraps prior to publication? Not the broader scientific community, whose members might find this data relevant to their own work. Perhaps not the scientists whose data are being kept under wraps, since they run the risk of being scooped and they miss out on potentially productive discussions with their fellow scientists. Is this kind of secrecy about data required for good peer review, or is this simply a matter of the journal editors wanting to make a splash, helping them to encourage subscriptions and attract advertisers?

Nature's rules, in other words, might be too restrictive to benefit anyone but Nature.

In the longer term, this story might lead the community of physics, and the administrative types in places where physics is funded and conducted, to think about how to tweak the system of individual rewards to encourage more timely and thorough communication of the sort that advances the scientific interests of the tribe of science as a whole. This might require Nature to revisit its rules -- or physicists and their home institutions to opt out of publishing in journals like Nature.

More like this

"I'm assuming here that Nature would regard a preprint on arXiv.org as a case of publicizing your data prior to publication."

I don't think that's right -- Nature doesn't count conference presentations or papers posted on preprint servers. From the Nature Precedings web site:

"Nature and all Nature journals have a policy that permits such posts on recognized pre- or e-print servers such as Nature Precedings and arXiv without affecting their eligibility for publication, whether or not such postings result in discussion on other sites and in the media."

Thanks for the clarification, Morgan. Presumably, that would mean that the discussion of the data in the other papers at arXiv would not be an impediment to the PAMELA scientists discussing those same data in a paper in Nature.

At which point, is the harm here that the scientists whose data these are having their thunder stolen (in the ongoing dark matter conversation) by the other scientists who reconstructed those data from the photographs?

The problem is tat the data PAMELA presented may have been somewhat preliminary. I don't have direct experience of large collaborations, but I've heard enough about it to know that for some collaborations, there are multiple levels of clearance to publicize results. Getting permission to show a data slide at a conference is easier than getting permission to publish it in a journal-- the collaboration may well want to do more analysis before the data are "really" released.

Conferences are kind of a fuzzy area in terms of publication norms. There are meetings at which you are explicitly enjoined not to cite anything that is reported-- the Gordon conferences are the best example, but there are others. Other conferences are more official (anything publishing detailed proceedings), and there are fields of study (Computer Science is the best example) in which conference presentations are the primary means of disseminating results.

If you're interested in these sorts of questions, Tomasso Dorigo ("A Quantum Diaries Survivor") got into a bit of trouble for reporting data on his blog before it had really been cleared, and as a result has written extensively about data release. There was also a bit of a controversy around the "Bump Hunting" posts at Cosmic Variance a while back, though I don't remember quite what that was about.

I am actually a co-author with Super Sally on two COBE/DIRBE papers in the NASA Astrophysics Data System abstract server.

Nature is OK with posting preprints to arXiv.org. The X-ray flash as a precursor to a supernova was posted in Feb 2008 and published in Nature in May 2008 with the press release and publicity delayed until May. See http://www.astro.ucla.edu/~wright/cosmolog.htm#22May08
So groups that have papers submitted to Nature but no preprints on the arXiv are just causing trouble and asking for trouble. If you are confident enough of the results to submit them to Nature, then you are confident enough to post them to the arXiv.

COBE and WMAP always tried to make the opening of the data archive, the posting/mailing of preprints, the submission to journals, and the press release/conference all simultaneous. This worked well except in April 1992 when Berkeley sent out a press release a day early. There should be no conference talks prior to this data release, since a conference talk is a data release.

It is easy to extract data from a Postscript figure, and pretty easy to measure the data from any figure, even a digitized photograph of the screen during a conference talk. I have done all of these things, and also the low-tech approach of enlarging a figure with a Xerox machine and then measuring the dots with a millimeter rule. See http://www.astro.ucla.edu/~wright/BOOMdat.html
So experiments posting preprints or giving conference talks might as well make the data available on a web site.

Janet,

It is Science, not Nature that has the absurd policy on preprints and the arXiv.

http://www.sciencemag.org/about/authors/prep/gen_info.dtl#prior

"Distribution on the Internet may be considered prior publication and may compromise the originality of the paper as a submission to Science."

I clarified this recently with the editors at Science and it means what it seems to: posting to the arXiv may harm or even eliminate one's chances of publication in Science.

This is outrageous coming from the AAAS. The first goal of the mission statement of the AAAS is to "Enhance communication among scientists, engineers, and the public."

http://www.aaas.org/aboutaaas/

Banning the use of the primary communication channel for pre-publication results hardly seems to serve that end.

Carl Bergstrom

By Carl Bergstrom (not verified) on 07 Sep 2008 #permalink

Who know that there would be digital cameras flashing as they were flashing their slide with the data?

Actually this should not be so unexpected. I have seen people make their own copies of conference slides using digital cameras since my first conference ever back in 2000. This is not surprising at all. What is surprising for me is that people are so quick to jump on new results -- but perhaps this should not be very surprising either.

There's another big difference between conference proceedings and nature papers. The latter are peer-reviewed. Knowing this, scientists know to trust a published paper MORE than a conference abstract (or the talk). Anyone citing such assertions by the authors, do so at their own risk. Still, personally I would contact the original authors beforehand if I was going to mention their unpublished results in my own paper.

By GinReaper (not verified) on 08 Sep 2008 #permalink

Here's a more general way to think about the specific problem.

Generally speaking, whatever "content" you're talking about, you're going to have four categories of people involved in the transaction: the content creator, the content editor, the content distributor, and the content consumer.

In the beginning, there were only two actual players assuming these roles. The bard wrote the song, edited the song, and performed the song, and the people in the tavern listened to the song. The philosopher mused about the theory, talked about the theory, and the students listened on the steps of the temple. Occasionally one of the students would argue, jumping into the editor role.

In the next phase we add the ability of people to record, first in print media (and then music and video recordings, but the situation abstracts the same). Now we add a third player -> the person who owns the printing press (or the record press, or what have you).

This person is taking over the content distributor role. We now have a new agenda involved, over time, the content distributor also starts horning in on the content editor role, because the content distributor has an economic incentive to do so. In the case of scientific publications, the content distributor has an additional incentive -> they're part of the content consumer role as well, and they don't want junk in the distribution cycle.

Not only because the distribution has a high cost and a limited page count, but because they're content consumers and they don't want to read junk.

But now in the digital age we have the ability to be content distributors ourselves, or (Intellectual Property tradition having arisen from the second age) our content *consumers* can redistribute it, sometimes in ways the original creator doesn't want it redistributed.

So now we've got the situation where *no* agent has a pure agenda -> each role (the creator, the editor, the distributor, and the consumer) can be played by any agent, and each agent actually wants to play multiple roles.

The best way to solve these problems is to decouple the incentives from the roles as much as possible -> the "first publish" push is overvalued. The body of publications is overvalued. The "space" in publications is overvalued. Body of contribution is currently only really measured through peer-review publications, that's actually bad, because it gives too much power to the publisher relative to its ability to deliver on the responsibility (turnaround time being one issue, "page count" being another). Not that we don't need peer-review, we just need it delivered through a different mechanism.

This has been a topic of a couple of symposiums at Caltech, nobody's got a good answer yet.

I think the problem here is one of etiquette. If you invited someone into your house, and they snapped a couple of photos, and then posted those photos on the Internet, you would likely be upset, even though legally they had done nothing wrong. I see nothing wrong with taking pictures of someone else's slides (perhaps in liu of taking notes), but publicly distributing a paper based on those photographs without the original researcher's explicit permission is a different question.

[Marco] Cirelli [one of those who photographed the slide with the data] maintains that he and others have done nothing wrong. "We asked the PAMELA people [there], and they said it was not a problem," he says

Did Cirelli explicitly ask the PAMELA team if he could publicly distribute a paper with the photographs he had taken? Or did he just ask if he could take photographs? Cirelli could have written a private analysis of the data and then sent his manuscript directly to the PAMELA researchers as private correspondence, perhaps cc-ing a few trusted peers. Instead he decided to make his analysis public, perhaps compromising his colleague's research.

By photographing the slides and writing a paper on it without the explicit consent of the original researchers, Cirelli has likely done the entire community a disservice by making scientists with cutting edge projects less likely to share their results within their community of peers prior to official publication. Rather than encouraging the open sharing of results (perhaps at a conference), this will encourage more researchers to make presentations without any substantive data and without clearly explaining their methodology. As you said "scientific communication is a good thing", but open scientific communication requires trust, and trust is built upon adherence to societal norms and etiquette.

A lively discussion of the slide-photographing issue at ChemBlog...