Publishing and Statistical Significance

By mixingmemory on September 20, 2006.

There's been some hubbub recently over a study by Gerber and Malhotra (you can get a copy in pdf here), which shows a couple things. First, political science journals don't publish many articles that report negative (null) results, but instead tend to publish those that report statistically significant results. Second, a large portion of those statistically significant results involve probabilities that are pretty damn close to .05 (the generally accepted cutoff for statistical significance). My first reactions were duh, and who cares?

Of course, I'm not a political scientist, so I can't speak for them, but in psychology, everyone has always known that it's damn hard to get null results published. And there are good reasons why that's so. For one, null results are less informative. It's difficult to tell whether they're the result of a lack of the hypothesized relationship between your variables, or instead, the result of chance or methodological problems (especially a lack of statistical power). So, if you want to publish a null result, you've got a bunch of extra work to do. In addition to calculating power (which people in some discinplines do automatically, anyway), you're almost certainly going to have to run extra variations of the study (even more than you would with a statistically significant result) to show that your null result wasn't the result of methodological problems.

There's another good reason why they aren't published: they're not expected! I know, on the surface, it looks like they should be expected, because typically there's at least a 95% chance that you won't get statistically significant results. But in reality, getting statistically significant results is actually pretty likely, because researchers generally don't conduct studies unless they're pretty confident, for theoretical reasons or whatever, that the hypothesized relationship between their variables exists. So if you get null results, it's actually pretty surprising.

When null results do get published, it's usually because the researchers had good reason to expect them, so they undertook the extra steps required to make null results publishable. In general, that's only interesting when there's a heated debate about the relationship between two variables, and one theory predicts that there is none. Even then, there are often better ways of demonstrating that than producing null results. You run other studies that test (non-null) hypotheses that distinguish between competing theories, for example.

As for why so many of the results cluster around the .05 level, well, that's probably a cultural thing. Researchers tend to be overly obsessed with statistical significance (as this study ironically shows), and that means that when you've got results that are approaching significance, you're going to employ a few tricks to get closer to it. In psychology, for example, you might run a few more participants than you'd planned, thereby increasing your statistical power, or you might tweak your methodology and rerun the experiment. In most cases, I think these solutions are harmless, particularly since there's no real a priori reason to be obsessed with the .05 level in the first place. If you're close, but not below it, chances are you're onto something, but you need the extra little push to get people to pay attention. If you're close, but actually committing a Type I error, chances are subsequent research will discover that. Sure, it might cause people to use time and resources driving down theoretical dead ends, but that's just the way science works, and making a big deal out of it is kind of silly. So again, I say, duh, and who cares?

More like this

Live by statistics, die by statistics

There is a magic and arbitrary line in ordinary statistical testing: the p level of 0.05. What that basically means is that if the p level of a comparison between two distributions is less than 0.05, there is a less than 5% chance that your results can be accounted for by accident. We'll often say…

Meta-Analysis Bogosity and the Power of Prayer

My fellow SBer Craig Hilberth at the Cheerful Oncologist writes about a meta-analysis that purports to show the positive effect of intercessory prayer. Neither Craig nor I have access to the full paper. But what we know is that the claim is that the meta-analysis shows a result of g=-0.171, p=0.…

The single most useful piece of advice I can give you, along with a theory as to why it isn't better known, all embedded in some comments on a recent article that appeared in the Journal of the American College of Cardiology

Our story begins with this article by Sanjay Kaul and George Diamond: The randomized controlled clinical trial is the gold standard scientific method for the evaluation of diagnostic and treatment interventions. Such trials are cited frequently as the authoritative foundation for evidence-based…

A new statistic begins to appear in journals: What the heck is a p-rep?

What is "significant" research? In most psychology journals, "significant" results are those measuring up to a difficult-to-understand statistical standard called a null-hypothesis significance test. This test, which seems embedded and timeless, actually has its origins in theoretical arguments…

Good points, all. Another reason to worry about "null" results is that they are frequently misinterpreted to mean the null hypothesis is true, not that it cannot be rejected as unlikely. Failure to reject still leaves open the possibility that any differences are real but there was insufficient statistical power or that some bias was involved.

Conversely, some results that are statistically significant are of no interest at all. They are just a reflection of a large number of measurements. If I were to measure the heights of all 6 year olds on the east coast and compare them to the heights of all six year olds on the west coast I doubt the average would be identical and I also guarantee you that a, say 1/64" difference, would be statistically significant. But who cares?

Consider also this report Plus titled Why Most Published Research Findings Are False: http://medicine.plosjournals.org/perlserv?request=get-document&doi=10.1…

The magazine Nature a few years ago ran a piece on this subject in which they proposed a special depository for null results. Don't know what became of it.

But clearly there is a selection bias in favor of positive results, which explains partly why so many of them are false.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Marvin

August 25, 2008

Back to real blogging soon, but before then, I wanted to post this. You probably saw a bit of this during NBC's Olympics coverage, but the whole thing has to be seen. It's one of the coolest things ever, though me being a huge Marvin Gaye fan might have something to do with me thinking that:

He's Just a Frackin' Adolescent Ass

July 26, 2008

Way, way back in September of 2005, a Danish newspaper published some cartoons depicting Muslims and their prophet, and in response, thousands of Muslim extremists responded with varying degrees of threatened and actual violence. As you all know, this resulted in a storm of media coverage around…

Fart Spray (And Disgust) Makes Moral Judgments More Severe

July 9, 2008

I've been meaning to post about this set of studies for a while, but because it's relevant to Chapter 4 of Lakoff's The Political Mind, I figured I'd better get around to it before I write the review of that chapter. It's been a while, but in the past, I've talked a lot about new theories of moral…

I Can't Understand Your Accent, So Keep Talking

July 8, 2008

I have this friend from New York who, most of the time, speaks in a normal (that is to say, southern) accent that she's acquired as a result of being surrounded for so long by people who speak the King's English ('cause Elvis was a southerner). Occasionally, though, usually after she's been talking…

The Political Mind, Part IV (Chapter 3)

July 7, 2008

In Chapter 3, we finally get to read all about the Strict Father and Nurturant Parent. I knew this was coming, of course, but for some reason, when I finally got to this chapter, I still felt surprised. I mean, at some point, you'd think he'd give up metaphors that even his own epigones can't find…