Why won't psychologists share their data?

By dmunger on November 16, 2006.

The APA has an important rule that all authors of APA-sponsored journal articles must agree to before publication:

After research results are published, psychologists do not withhold the data on which
their conclusions are based from other competent professionals who seek to verify the
substantive claims through reanalysis and who intend to use such data only for that
purpose, provided that the confidentiality of the participants can be protected and
unless legal rights concerning proprietary data preclude their release.

The rule seems quite straightforward. But when data is requested, how many researchers actually comply? A group led by Jelte M. Wicherts has put that question to the test (PDF via BPS Research Digest). They asked the authors of articles in four prominent journals for copies of their datasets, a total of 141 studies. If it was demanded, they provided their academic credentials and institutional approval for their work. How many researchers complied? A chart of their results is below.

In the end, only 27 percent of researchers complied with the request for data. Through non-reaction, or outright refusal, the vast majority of researchers did not share their data.

There are a few points in the researchers' defense. In some fields, sharing data may not be something that is typically required or done -- researchers may have felt that everything a fellow scientist would need to replicate their results was included in the journal article itself. Also, when you work in a specific field, you know who your colleagues are around the world. There is only a very small set of researchers engaged in your particular line of work -- perhaps fewer than a dozen. It may be that psychologists would be more willing to share data with someone when they were certain that person was working on the same problems they were. Getting an email out of the blue from a researcher you've never heard of is different from a request by a respected colleague.

On the other hand, the opposite might be true; researchers might view these colleagues as "competition," and be even less willing to share with them. It would be fascinating to see a study where the requests for data came from close colleagues. This would be more difficult to do, but the results would be more telling as well.

In other news:

More like this

I'm not a scientist, but I was recently talking to a scientist friend about his work, and this subject came up. He recounted an experience from a few years back where he responded to a request by providing his complete dataset, and got burned when some aspects were misinterpreted and some of his results were called into question. It turned out OK, but consumed a lot of his time to straighten things out. No benefit to him, but considerable cost.

There may be another reason here for reluctance to share.

It's potentially very embarrassing to share your raw data! After all, it's not so clean as it seems in the article, and you may have interpreted or analyzed it incorrectly, or it may become evident that other analyses were possible that don't really support your conclusion....

I recently shared my data with another researcher and he found that I had accidentally merged two groups together in my statistical analysis, significantly distorting my results. (Fortunately, once the groups were properly separated, the data supported my published conclusion even more strongly.)

Who wants to expose themselves to such potential embarrassment, especially if the other researcher might go to print with it?

Yet I think integrity demands it.

I wonder if this is, at least in part, a "tragedy of the commons" thing. In other words, if everyone shared their data -- if, say, every journal included an online webpage where everyone who published in that journal was required to put their datasets -- then it would no longer be so costly for any one researcher to share with any one other researcher. (Indeed, if it were the industry standard it would be costly not to share; not to mention that if the dataset were generally available to everyone, a single person misanalysing your data would be less of a problem).

But as it is, sharing data generally means risk of potential embarassment or hassle, with no benefit to you. Furthermore, it's often a lot of work to "clean up" a dataset to the point that it is intelligible to someone else, particularly if it's a complicated dataset or study. By "cleaning up" I mean doing things like labeling all the variables and explaining for each one if they are independent of each other, why you might be missing some data points, etc. That's a lot of work. I absolutely think it's essential for academic integrity to share data when asked, but as it is, it's definitely a high cost, no reward situation.

I'm neither a psychologist nor from academia, but my experience as a programmer and a user of free software suggests another possible reason.

Many people I've known in IT-related fields are infected with the virulent never-share-information meme. They're irrationally opposed to making data public, even when it would obviously work in their favor. Like hardware vendors who won't release specs, preventing their products from having working Linux or OpenBSD drivers. Or software makers who don't release format and protocol specifications, preventing other software from interoperating with theirs. (For every Microsoft that gets away with this, a thousand small firms fail and are forgotten, because in their paranoia they didn't let other programs integrate with them.)

Perhaps some of these researchers have proprietary feelings for their data, and that contributes to this behavior. Just a thought.

How does this compare with other fields? are researchers in other disciplines more or less likely to share the raw data-and what may this mean for the progress in different fields?

In some disciplines, data is huge and cost a lot of money (in neurophysiology for example). Nevertheless, I think data should be available whenever it is possible. Reproducibility is at the heart of science. Reviewers have little chance of getting on analysis error or fraud. Other scientists should be able to re-analyse the data under different angles.

Every scientist should share all their data. It's only common sense, otherwise how can his or her peers validate their work. And congress should mandate that any person or group who receives government funding should release all their data.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

Cognitive Daily Closes Shop after a Fantastic Five-Year Run

January 20, 2010

Five years ago today, we made the first post that would eventually make its way onto a blog called Cognitive Daily. We thought we were keeping notes for a book, but in reality we were helping build a network that represented a new way of sharing psychology with the world. Cognitive Daily wasn't the…

Both musicians and non-musicians can perceive bitonality

January 20, 2010

Take a listen to this brief audio clip of "Unforgettable." Aside from the fact that it's a computer-generated MIDI performance, do you hear anything unusual? If you're a non-musician like me, you might not have noticed anything. It sounds basically like the familiar song, even though the…

Synesthesia and the McGurk effect

January 14, 2010

We've discussed synesthesia many times before on Cognitive Daily -- it's the seemingly bizarre phenomenon when one stimulus (e.g. a sight or a sound) is experienced in multiple modalities (e.g. taste, vision, or colors). For example, a person might experience a particular smell whenever a given…

Does watching TV really kill you?

January 12, 2010

Today I had to put off my normal morning run in order to make time to be interviewed on a radio show at 7:30 a.m. As I waited on hold for the interview to start, I could hear the hosts joking back-and-forth about what the "latest TV controversy" is. "Is it the Jay Leno / Conan O'Brien news on NBC…

The outfielder problem: The psychology behind catching fly balls

January 7, 2010

It's football season in America: The NFL playoffs are about to start, and tonight, the elected / computer-ranked top college team will be determined. What better time than now to think about ... baseball! Baseball players, unlike most football players, must solve one of the most complicated…