Publishing and Statistical Significance

There's been some hubbub recently over a study by Gerber and Malhotra (you can get a copy in pdf here), which shows a couple things. First, political science journals don't publish many articles that report negative (null) results, but instead tend to publish those that report statistically significant results. Second, a large portion of those statistically significant results involve probabilities that are pretty damn close to .05 (the generally accepted cutoff for statistical significance). My first reactions were duh, and who cares?

Of course, I'm not a political scientist, so I can't speak for them, but in psychology, everyone has always known that it's damn hard to get null results published. And there are good reasons why that's so. For one, null results are less informative. It's difficult to tell whether they're the result of a lack of the hypothesized relationship between your variables, or instead, the result of chance or methodological problems (especially a lack of statistical power). So, if you want to publish a null result, you've got a bunch of extra work to do. In addition to calculating power (which people in some discinplines do automatically, anyway), you're almost certainly going to have to run extra variations of the study (even more than you would with a statistically significant result) to show that your null result wasn't the result of methodological problems.

There's another good reason why they aren't published: they're not expected! I know, on the surface, it looks like they should be expected, because typically there's at least a 95% chance that you won't get statistically significant results. But in reality, getting statistically significant results is actually pretty likely, because researchers generally don't conduct studies unless they're pretty confident, for theoretical reasons or whatever, that the hypothesized relationship between their variables exists. So if you get null results, it's actually pretty surprising.

When null results do get published, it's usually because the researchers had good reason to expect them, so they undertook the extra steps required to make null results publishable. In general, that's only interesting when there's a heated debate about the relationship between two variables, and one theory predicts that there is none. Even then, there are often better ways of demonstrating that than producing null results. You run other studies that test (non-null) hypotheses that distinguish between competing theories, for example.

As for why so many of the results cluster around the .05 level, well, that's probably a cultural thing. Researchers tend to be overly obsessed with statistical significance (as this study ironically shows), and that means that when you've got results that are approaching significance, you're going to employ a few tricks to get closer to it. In psychology, for example, you might run a few more participants than you'd planned, thereby increasing your statistical power, or you might tweak your methodology and rerun the experiment. In most cases, I think these solutions are harmless, particularly since there's no real a priori reason to be obsessed with the .05 level in the first place. If you're close, but not below it, chances are you're onto something, but you need the extra little push to get people to pay attention. If you're close, but actually committing a Type I error, chances are subsequent research will discover that. Sure, it might cause people to use time and resources driving down theoretical dead ends, but that's just the way science works, and making a big deal out of it is kind of silly. So again, I say, duh, and who cares?

More like this

There is a magic and arbitrary line in ordinary statistical testing: the p level of 0.05. What that basically means is that if the p level of a comparison between two distributions is less than 0.05, there is a less than 5% chance that your results can be accounted for by accident. We'll often say…
My fellow SBer Craig Hilberth at the Cheerful Oncologist writes about a meta-analysis that purports to show the positive effect of intercessory prayer. Neither Craig nor I have access to the full paper. But what we know is that the claim is that the meta-analysis shows a result of g=-0.171, p=0.…
What is "significant" research? In most psychology journals, "significant" results are those measuring up to a difficult-to-understand statistical standard called a null-hypothesis significance test. This test, which seems embedded and timeless, actually has its origins in theoretical arguments…
I was struck by this paper that came out in the Journal of Child Neurology, looking back at previous study of mercury levels in autistic children. DeSoto and Hitlan looked back at Ip et al. 2004, a case control study that compared the blood and hair levels of mercury in children with autism to…

Good points, all. Another reason to worry about "null" results is that they are frequently misinterpreted to mean the null hypothesis is true, not that it cannot be rejected as unlikely. Failure to reject still leaves open the possibility that any differences are real but there was insufficient statistical power or that some bias was involved.

Conversely, some results that are statistically significant are of no interest at all. They are just a reflection of a large number of measurements. If I were to measure the heights of all 6 year olds on the east coast and compare them to the heights of all six year olds on the west coast I doubt the average would be identical and I also guarantee you that a, say 1/64" difference, would be statistically significant. But who cares?

Consider also this report Plus titled Why Most Published Research Findings Are False: http://medicine.plosjournals.org/perlserv?request=get-document&doi=10.1…

The magazine Nature a few years ago ran a piece on this subject in which they proposed a special depository for null results. Don't know what became of it.

But clearly there is a selection bias in favor of positive results, which explains partly why so many of them are false.