When you're investigating charges that a scientist has seriously deviated from accepted practices for proposing, conducting, or reporting research, how do you establish what the accepted practices are? In the wake of ClimateGate, this was the task facing the Investigatory Committee at Penn State University investigating the allegation (which the earlier Inquiry Committee deemed worthy of an investigation) that Dr. Michael E. Mann "engage[d] in, or participate[d] in, directly or indirectly, ... actions that seriously deviated from accepted practices within the academic community for proposing, conducting, or reporting research or other scholarly activities".
One strategy you might pursue is asking the members of a relevant scientific or academic community what practices they accept. In the last post, we looked at what the Investigatory Committee learned from its interviews about this question with Dr. Mann himself and with Dr. William Easterling, Dean, College of Earth and Mineral Sciences, The Pennsylvania State University. In this post, we turn to the committee's interviews with three climate scientists from other institutions, none of whom had collaborated with Dr. Mann, and at least one of whom has been very vocal about his disagreements with Dr. Mann's scientific conclusions.
The scientists interviewed by the Investigatory Committee were Dr. William Curry, Senior Scientist, Geology and Geophysics Department, Woods Hole Oceanographic Institution; Dr. Jerry McManus, Professor, Department of Earth and Environmental Sciences, Columbia University; and Dr. Richard Lindzen, Alfred P. Sloan Professor, Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology. As the Investigatory Committee's final report (PDF) notes:
The interviews were audio-taped and verbatim transcripts were prepared. All interviewed individuals were provided an opportunity to review the transcripts of their interviews for accuracy. The transcripts will be maintained in the Office for Research Protections as part of the official record.
The report presents each of the interviews sequentially. However, the three climate scientists they interviewed from outside Penn State (Dr. Curry, Dr. McManus, and Dr. Lindzen) were asked essentially the same set of questions, with the aim of establishing their views of the accepted practices within their field, so I'll be looking at those three interviews together, question by question, rather than individually.
Here's the first question:
"Would you please tell us what you consider in your field to be accepted standard practice with regard to sharing data?"
Dr. Curry's response:
With regard to sharing data, Dr. Curry indicated that standard practice is that once a publication occurs, the pertinent data are shared via some electronic repository. He stated that not all researchers actually comply with this practice, and that there may be special arrangements with the funding agency, or the journal that publishes the research, that specify when data need to be made available to other researchers. In Dr. Curry's case, for example, the National Science Foundation allows a two-year window during which he has exclusive rights to his data. After that period he must make it available to others.
Dr. McManus's response:
Dr. McManus responded by first drawing a distinction between published and unpublished data, noting, however, that there is a range of standard practices with regard to both. Nevertheless, the mode of behavior regarding unpublished data is to share "in a fairly limited fashion with individuals or groups who make specific requests and typically who are known to the researcher." Regarding published data, Dr. McManus indicated that standard practice is to make such data available through any of a broad range of means, including providing access to electronic repositories and institutional archives.
Dr. Lindzen's response:
Dr. Lindzen responded by stating that "with respect to sharing data, the general practice is to have it available."
The distinction between whether the data are published or unpublished turns up in both Dr. Curry's and Dr. McManus's answers. If data are unpublished, the presumption is that the researcher who gathered them is still working with them to draw some (publishable) conclusions -- and since the publish-or-perish competition is a real factor in a scientist's employment and funding, not sharing the data on which you're still working sounds like it falls within accepted practices.
After publication of conclusions based on that data, Dr. McManus and Dr. Curry state that the standard practice is to share data with other researchers.
Dr. Curry's responses about the NSF policy suggest that in some circumstances there may be a ticking clock on a researcher's privileged access to unpublished data, where after a certain length of time the data should be shared with others even if you haven't drawn publishable conclusions from it -- sharing it may allow someone in the scientific community to draw useful conclusions from it. But an explicit policy specifying a two-year window for exclusive rights to data certainly suggests that it is an accepted practice not to share data in certain circumstances -- and that demanding that data be shared in these circumstances might be the serious deviation from accepted practices.
The second question touches on a related issue:
"Would you please tell us what you consider in your field to be accepted standard practice with regard to sharing unpublished manuscripts?"
Dr. Curry's response:
On the issue of sharing unpublished manuscripts, Dr. Curry stated that if the manuscript was accompanied by a request to keep it confidential, he would not share it with anyone; if it was not accompanied by an explicit request for confidentiality, he might talk about it with colleagues but would not usually forward it.
Dr. McManus's response:
Regarding the sharing of unpublished manuscripts, Dr. McManus indicated that there is a broad range of typical and accepted behaviors, with such manuscripts commonly shared with a limited number of colleagues. In a follow-up question, it was inquired whether it may be considered standard practice to share an unpublished manuscript with others without getting express permission to do so from the author. Dr. McManus responded by saying "no" to such sharing as standard practice, but allowing that there is not necessarily only one acceptable practice, as permission may be given implicitly or explicitly. Without specific encouragement for wider distribution, however, it is generally understood, according to Dr. McManus, that unpublished papers are not intended for third-party distribution.
Dr. Lindzen's response:
With respect to unpublished manuscripts, he indicated that "those are generally not made available unless the author wishes to." In response to a number of follow-up questions, Dr. Lindzen indicated that if an unpublished manuscript is sent to a scientist by the author, it would be common practice to ask for permission before sharing it with others; if it was sent by someone else it would be common practice to ask if they had permission to share the paper. According to Dr. Lindzen, a scientist might conclude that there is implicit permission to disseminate an unpublished paper only when the author made it clear that the results may be disseminated.
As an aside, back in graduate school I became familiar with the practice of including a header on manuscripts that said, "DRAFT: DO NOT CIRCULATE OR CITE WITHOUT PERMISSION." That handy little line of text at the top of each page could make the author's wishes absolutely explicit, whether they were grounded in a desire not to have her best ideas swiped by someone else or a sense that she was likely to modify her arguments and conclusions substantially before the draft was submitted to a broader audience.
However, as Dr. McManus's answer to the question indicates, there are instances where permission may be implicit rather than explicit. Making it explicit seems less likely to result in misunderstanding, but this doesn't mean that implicit permission is never given or properly interpreted as such. Some scientists in the community may be perfectly comfortable trusting the colleagues with whom they have shared their unpublished manuscripts to make reasonable decisions about whether to pass those manuscripts on to other colleagues. Other scientists in the community want to make those decisions about who sees their unpublished manuscript themselves.
I'm a little puzzled by one piece of Dr. Lindzen's response. If "a scientist might conclude that there is implicit permission to disseminate an unpublished paper only when the author made it clear that the results may be disseminated," isn't this essentially saying that you can conclude that you have implicit permission to share only when the author has made that permission to share explicit? In this case, why would implicit permission be relevant? Maybe something went wrong here between the transcript of the interview and the summary provided in the final report of the Investigatory Committee.
Here's the third question:
"Would you please briefly explain how codes are developed in the process of evaluating data in your field, e.g., are these codes significantly different from published software packages? Then please tell us what you consider in your field to be accepted standard practice with regard to sharing codes."
Dr. Curry's response:
Dr. Curry reported that in his area, most codes are fairly basic and researchers use software packages to construct them. He also reported that he was not aware of any public archive for such codes, but that he was fairly certain that if he asked another researcher to share such codes, he would most likely get them. He added that overall compliance with requests to share codes would probably be equal to the rate of compliance with requests for sharing data.
Dr. McManus's response:
Dr. McManus indicated that most, but not all, details of such methods are usually reported when research is published, and that some of these details may be shared in a "somewhat ad hoc basis." Generally, however, the tendency is to "try to provide the conditions by which any research can be replicated .... " Dr. McManus agreed that generally, codes are treated the same way as any other method.
Dr. Lindzen's response:
Dr. Lindzen responded by stating that "it depends." He elaborated, saying that if the codes are very standard, it is unnecessary to share them, but if it's an unusual analysis it would be his practice to make the codes available to anyone who wishes to check them. In a follow-up question, Dr. Lindzen was asked whether he would have issues with people running into compatibility issues or compilation issues. He responded by saying that even if people "screw it up" or if you have reservations about sharing codes, "if somebody asks you how did you get this, you really should let them know."
This question gets us into the territory of which methods researchers describe in loving detail in their scientific papers and which they merely mention with the assumption that the method is so commonplace to researchers in their field that they all either know that method already or can easily find the necessary information on how to use it. Of course, sometimes these assumptions go wrong.
Each of the three scientists interviewed by the Investigatory Committee suggested that the computer code used by climate scientists can be fairly standard, drawing on basic building blocks from widely available software packages, and that it falls within accepted practices not to publish all the details of such methods (presumably because the details that are not published are so standard). Each also recognized that the accepted practice, if asked to share your code by another researcher (especially one trying to replicate your results) is to share it.
Not discussed here is the issue Dr. Mann raised in his testimony, that the software code that you develop to analyze raw data may legitimately count as intellectual property. I'd be interested to here what scientists who develop a lot of their own code (either from scratch or in non-obvious ways from standard software packages) have to say about accepted practices around sharing code, both before and after the publication of results obtained with the use of such code.
The fourth question kind of circles back to the first question:
"How do the processes of data acquisition, analysis and interpretation in paleoclimatology affect practices of data sharing in the field? Are any of these processes unique to paleoclimatology?"
Dr. Curry's response:
Dr. Curry asked for clarification and was told that the question referred to whether the laborious and expensive way in which most data are collected in paleoclimatology had an effect on data sharing. He then responded that requests for raw data would be the exception rather than the rule, because transforming the raw data into usable information is labor intensive and difficult. Nevertheless, because of NSF requirements, he would release all data after two years. He added that some scientists, however, do seek to maintain proprietary access to their data even after two years.
Dr. Lindzen's response:
Dr. Lindzen indicated that he did not think that these processes are unique to paleoclimatology, and that since most of the data are acquired using public funds, there is no basis for investigators being proprietary with their data. In response to a follow-up question, Dr. Lindzen acknowledged that prior to publication, scientists may have a variety of reasons to keep things confidential, but after publication "there's an obligation to explain exactly how you got them, especially if they're controversial."
I'm not sure whether Dr. McManus was asked this question or not. The final report of the Investigatory Committee does not record his answer.
Here, Dr. Lindzen is speaking up for the scientific norm of communism as well as for the idea that the public, having paid for the creation of the scientific knowledge, has an interest in broad data sharing that leads to the production of more knowledge (or of better knowledge, on account of the larger number of scientists scrutinizing the data and the conclusions drawn from them). His testimony also speaks to the idea that scientists are supposed to lay out the reasoning and the observations that underpin their conclusions -- that part of the power of scientific knowledge is the intelligibility of the connection between data and conclusions.
But, as Dr. Curry points out and Dr. Lindzen acknowledges, it is an accepted practice not to share one's data prior to publication. And, given the utility of a proprietary data set that might support even more publishable conclusions, one edge of accepted practices might encompass efforts to have sole use of that data just a little bit longer.
The reality of current scientific practice includes the cooperative efforts of scientists to understand particular phenomena and the competition between individual scientists for priority for discoveries, funding, jobs, and the like. The accepted practices of the community thus reflect both the cooperative and the competitive aspects.
I'm intrigued by Dr. Curry's observation that "transforming the raw data into usable information is labor intensive and difficult" and that, because of this, researchers generally do not request sharing of raw data. On the one hand, it suggests an area where scientists may pragmatically rely on trust rather than proof -- because redoing another researcher's analysis would take a long time and be easy to mess up. On the other hand, the difficulty of transforming raw data into usable information (and the presumptive opportunities for making significant mistakes in the course of that transformation) would seem to make this a place where more researchers following the transformation of raw data to usable results would be a good guard against error, some quality control to ensure that all that time and labor produced reliable results.
Maybe the Open Notebook Science crowd will have some comments on this.
At this point, it's worth comparing the responses of these three scientists to those of Dr. Mann to the same questions.
On the question of sharing data, Dr. Mann indicated that when the data with which he was working were not already in the public domain, he shared them as soon as he was allowed to by the people who initially obtained them. This seems to fit within the accepted practices described by Dr. Curry, Dr. McManus, and Dr. Lindzen.
On the question of sharing unpublished manuscripts, Dr. Mann indicated that "he may have forwarded such a manuscript to a specific, close colleague, in the belief that permission to do so had been implicit, based on his close collegial relationships with the paper's authors. ... In response to a follow-up question, Dr. Mann asserted that such judgments about implied consent are quite typical in his field, but they are made only as long as it is understood that such sharing would take place only among trusted colleagues who would maintain the confidentiality of the manuscript." Dr. Curry, Dr. McManus, and Dr. Lindzen indicated that is is better to ask for explicit permission before sharing someone else's unpublished manuscript, but Dr. McManus also acknowledged that implicit permission is sometimes given.
On the question of sharing computer code, Dr. Mann noted that agencies like NSF have recognized that computer code may count as intellectual property. However, he also noted that since 2000 he has made all his source code available to the research community. This fits squarely within the account of Dr. Curry, Dr. McManus, and Dr. Lindzen that the accepted practice is to share such source code when a request is made for it.
Given these interviews, Dr. Mann's practices are looking like they do not depart from the accepted practices of the community of climate scientists. In the next post, we'll consider the other sources of information besides the interviews that the Investigatory Committee relied upon to establish what counts as accepted practices within the academic community (and specifically within the community of climate scientists) for proposing, conducting, or reporting research, as well as the conclusions the committee drew in its final report
Regarding the question of how much of a hassle it is to reprocess raw data, a lot of that depends on how the analysis is displayed. Human nature being what it is, if you don't think anyone will be interested or likely to want to follow exactly how your raw data was converted, you'll do the minimum you have to so that only you can figure it out. Whenever I train new students I see this phenomenon in action. Almost invariably the first few experiments students record don't have enough information. But after getting comments from myself and other mentors they start to understand how to present information and analysis so that anyone can follow the logic. For example, last week I had a student report on the weight of some samples. Some measurements were reported with more precision than others, which makes me apprehensive because she was using the same scale. In future experiments she will record the weight of the empty container and the weight of the container with sample in a Google Spreadsheet and do the subtraction withing the sheet. That way it will be much easier to spot mistakes and faulty assumptions. It might seem like a trivial detail but these are the fundamental building blocks of even the most complex scientific projects. The rawer the data and the more transparent the analysis, the stronger the foundation.