The APA has an important rule that all authors of APA-sponsored journal articles must agree to before publication:
After research results are published, psychologists do not withhold the data on which
their conclusions are based from other competent professionals who seek to verify the
substantive claims through reanalysis and who intend to use such data only for that
purpose, provided that the confidentiality of the participants can be protected and
unless legal rights concerning proprietary data preclude their release.
The rule seems quite straightforward. But when data is requested, how many researchers actually comply? A group led by Jelte M. Wicherts has put that question to the test (PDF via BPS Research Digest). They asked the authors of articles in four prominent journals for copies of their datasets, a total of 141 studies. If it was demanded, they provided their academic credentials and institutional approval for their work. How many researchers complied? A chart of their results is below.
In the end, only 27 percent of researchers complied with the request for data. Through non-reaction, or outright refusal, the vast majority of researchers did not share their data.
There are a few points in the researchers' defense. In some fields, sharing data may not be something that is typically required or done -- researchers may have felt that everything a fellow scientist would need to replicate their results was included in the journal article itself. Also, when you work in a specific field, you know who your colleagues are around the world. There is only a very small set of researchers engaged in your particular line of work -- perhaps fewer than a dozen. It may be that psychologists would be more willing to share data with someone when they were certain that person was working on the same problems they were. Getting an email out of the blue from a researcher you've never heard of is different from a request by a respected colleague.
On the other hand, the opposite might be true; researchers might view these colleagues as "competition," and be even less willing to share with them. It would be fascinating to see a study where the requests for data came from close colleagues. This would be more difficult to do, but the results would be more telling as well.
In other news:
- Mind Hacks has links to cognitive scientists' predictions for the next century
- Deric Bownds has an excellent explanation of how studies of "invisible" images work
- Jonah Lehrer tells us about new research on phantom limbs
- Nominate your favorite science blog for the Weblog Awards (I nominated Mixing Memory)
- Log in to post comments
I'm not a scientist, but I was recently talking to a scientist friend about his work, and this subject came up. He recounted an experience from a few years back where he responded to a request by providing his complete dataset, and got burned when some aspects were misinterpreted and some of his results were called into question. It turned out OK, but consumed a lot of his time to straighten things out. No benefit to him, but considerable cost.
There may be another reason here for reluctance to share.
It's potentially very embarrassing to share your raw data! After all, it's not so clean as it seems in the article, and you may have interpreted or analyzed it incorrectly, or it may become evident that other analyses were possible that don't really support your conclusion....
I recently shared my data with another researcher and he found that I had accidentally merged two groups together in my statistical analysis, significantly distorting my results. (Fortunately, once the groups were properly separated, the data supported my published conclusion even more strongly.)
Who wants to expose themselves to such potential embarrassment, especially if the other researcher might go to print with it?
Yet I think integrity demands it.
I wonder if this is, at least in part, a "tragedy of the commons" thing. In other words, if everyone shared their data -- if, say, every journal included an online webpage where everyone who published in that journal was required to put their datasets -- then it would no longer be so costly for any one researcher to share with any one other researcher. (Indeed, if it were the industry standard it would be costly not to share; not to mention that if the dataset were generally available to everyone, a single person misanalysing your data would be less of a problem).
But as it is, sharing data generally means risk of potential embarassment or hassle, with no benefit to you. Furthermore, it's often a lot of work to "clean up" a dataset to the point that it is intelligible to someone else, particularly if it's a complicated dataset or study. By "cleaning up" I mean doing things like labeling all the variables and explaining for each one if they are independent of each other, why you might be missing some data points, etc. That's a lot of work. I absolutely think it's essential for academic integrity to share data when asked, but as it is, it's definitely a high cost, no reward situation.
I'm neither a psychologist nor from academia, but my experience as a programmer and a user of free software suggests another possible reason.
Many people I've known in IT-related fields are infected with the virulent never-share-information meme. They're irrationally opposed to making data public, even when it would obviously work in their favor. Like hardware vendors who won't release specs, preventing their products from having working Linux or OpenBSD drivers. Or software makers who don't release format and protocol specifications, preventing other software from interoperating with theirs. (For every Microsoft that gets away with this, a thousand small firms fail and are forgotten, because in their paranoia they didn't let other programs integrate with them.)
Perhaps some of these researchers have proprietary feelings for their data, and that contributes to this behavior. Just a thought.
How does this compare with other fields? are researchers in other disciplines more or less likely to share the raw data-and what may this mean for the progress in different fields?
In some disciplines, data is huge and cost a lot of money (in neurophysiology for example). Nevertheless, I think data should be available whenever it is possible. Reproducibility is at the heart of science. Reviewers have little chance of getting on analysis error or fraud. Other scientists should be able to re-analyse the data under different angles.
Every scientist should share all their data. It's only common sense, otherwise how can his or her peers validate their work. And congress should mandate that any person or group who receives government funding should release all their data.