Mixing Memory

Insignificant vs. Non-significant

Like 99.8% of the people in psychology departments, I hate teaching statistics, in large part because it’s boring as hell, for both the instructors and the students, but also because students have a hell of a time grasping it, and that makes for some really painful interactions. Part of the problem, I think, is that the way we talk about statistics wasn’t designed to facilitate undergraduate instruction. And to see this, you need look no further than the concept of statistical significant.

First of all, whose idea was it to refer to it as significance? I mean, the first thing you tell students is that a statistically significant result doesn’t mean that the result is significant in any meaningful sense (say, practically), but of course, they never get that, because it’s confusing. And as a result, they constantly refer to null results as “insignificant.” But they’re not “insignificant,” or at least, they aren’t necessarily so. They might very well be significant — a null result in a study seeking to find a connection between autism and vaccines, say, could be very significant, especially for those being sued by the families of autistic children. So I tell my students, over and over and over and over and over again, to refer to the results of statistical tests that don’t achieve statistical significance as “non-significant.” But “non-significant” is not a word anybody uses in any context, ever, except in statistics. So they say, “OK, non-significant not insignificant, got it,” and then in every paper and every presentation, they write or say, “Our results were insignificant.” Aaaaaaaaaaaaahhhhhhhhhh! Sometimes, you can almost see their brains trying to convince their mouths to say, “non-significant,” but their mouths refuse, and “insignificant” comes out. It’s just plain frustrating.

Now, I’d be happy to do away with the concept of statistical significance altogether, but I’m not the one who makes these sorts of decisions, so if we have to keep it, and teach it, can we please call it something else? How ’bout, “statistically good enough for me to publish,” “statistically better for us than if our p-value had been greater than .05/.01/.001,” or “statistically gnarly?” ‘Cause this “significance” shit ain’t working.

Comments

  1. #1 qetzal
    May 6, 2008

    Perhaps you should insist they use “statistically insignificant” or “not statistically significant.” I think that’s clearer (and hopefully easier for them to understand remember) than insignificant versus non-significant.

  2. #2 JRQ
    May 6, 2008

    I prefer “Statistically Awesome” vs. “Statistically Lame”

  3. #3 Richard Simons
    May 6, 2008

    I’m surprised that you and your students find statistics boring. It has been my experience that students find it interesting. However, I agree that parts of it is far from obvious, in particular the concept of hypothesis testing which always seems to be counter-intuitive and, at first sight, unnecessarily convoluted. I’ve not had problems with people confusing not significant with insignificant but that might be because many did not have English as their first language so were probably unfamiliar with the word ‘insignificant’.

  4. #4 NJ
    May 6, 2008

    Here’s an idea I’ve used. Imagine someone dumping a sizeable rock into a lake from a 10-foot high dock while you have your back turned to them. On a calm, quiet day, there is little background noise, so you could clearly hear the splash of the rock into the water if it is dropped. But during a hurricane, you may not be able to distinguish the splash of the rock from the wind and wave noises. The test isn’t a measure of whether the rock was dropped or not, just if the splash noise can be heard over everything else.

    So a test of significance is not a measure of the objective importance (or lack thereof) of some measurement, just a measure how well the measurement stands out against the background.

  5. #5 Becca
    May 6, 2008

    +1 to JRQ
    it’s always safer to just state your p value than to call it significant or not significant.
    if I see another paper that performs 50 comparisons, and, using a p=0.05 cutoff, calls them all significant, I shall scream.

  6. #6 PhysioProf
    May 6, 2008

    I think “statistically discernable difference” is a better term for it, as it also lends itself to the notion that the difference is discernable in the context of a given tolerance for possible error.

  7. #7 Scott Simmons
    May 6, 2008

    “Like 99.8% of the people in psychology departments …”

    Can you supply a sample size and p-value for this claim?

    (Runs away.)

  8. #8 DrugMonkey
    May 6, 2008

    The term you are looking for is “reliable”. Statistically reliable is a much better way to think about it and gets one away from the confusion with the real world type of “significance”.

    on a related note, the statistical tests do not “reveal” or “demonstrate” findings either. The data themselves “reveal” or “demonstrate” findings my friends… The stats confirm the likely reliability of findings. A minor semantic distinction when writing a Results section you say? No, it is important because it makes people think in really stupid ways about what the data are indeed saying.

  9. #9 Chris
    May 6, 2008

    dm, The problem with reliable is that it means something slightly different, which means you have to explain one more thing. And without the explantion, it’s even more opaque than “significant.”

  10. #10 Jim Thomerson
    May 6, 2008

    I have always referred to “significantly different” or “not significantly different”. I have done a couple of lab exercises in introductory biology. One of them fails to refute the null hypothesis. It is a comparison of DBH distribution in two groves of pine trees by the library. The students are surprised and upset when they find no significant difference. Then I show them the aereal photo of the library with both groves being planted at the same time.

    My statistics course (only one) was statistics for geology majors. It used geological examples and was quite interesting.

  11. #11 MMcG
    May 7, 2008

    I feel your pain. You’d think the profession given responsibility for understanding memory and learning would have figured this stuff out.

    DrugMonkey, not wanting to start a whole new discussion, but the data don’t reveal anything either. The data are interpretted, while the stats aid in that interpretation when used correctly.

    I would think that “statistically reliable” is likely to cause all kinds of headaches when you talk to them about methodological reliability (which can be an adventure in itself).

  12. #12 Al
    May 7, 2008

    How about “Better than nothing”?

  13. #13 Jill
    May 7, 2008

    It’s true: students and teachers in stats classes would all often rather be somewhere else, and the language doesn’t help to make it interesting. I like NJ’s example of the falling rock to illustrate picking out what happened from the background: and it also seems to be the sort of illustration that you can refer to for a number of concepts.

    I have been involved for some years with the development of a tutorial now widely in use in UK universities and businesses, designed to run alongside more formal teaching to help get some of the basic concepts across. This then means that less teaching time is spent re-explaining the fundamentals, and it can be used more productively. If you are interested in taking a look, you can get a demo from http://www.conceptstew.co.uk/PAGES/demo.html

  14. #14 Chris
    May 7, 2008

    Oh, and Scott, the proportion of statistics that are made up on the fly is .9637173952 (95% confidence interval +/- 1.0).

  15. #15 Andy
    May 7, 2008

    Often wondered about this. What does a p < 0.05 mean? Well that the p(d|H_0) < 0.05. Doesn’t roll off the tongue any more easily that “statistically significant”. Maybe thinking in terms of confidence intervals makes it easier? There’s gotta be somewhere we can go to find a better phrase!

    “Population generalisable”?

    “The mean of population 1 is likely to be higher than the mean of population 2″?

    “We found no evidence that the direction of difference in sample means will generalise”.

    “The 95% CI includes zero” or even “the 95 CI includes zero”.

    Hmmmmmmmmm.

    Is there a wiki somewhere where we can brainstorm this?

  16. #16 Andy
    May 7, 2008

    Hmmm nuked part of my comment – never mind :-(

  17. #17 Mark P
    May 7, 2008

    Statistics definitely is not my cup of tea, but I have found it useful to explain this sort of finding by saying, for example, that there is a 95% probability that the results are not due to chance. That seems to make fairly understandable even to me.

  18. #18 Tim
    May 7, 2008

    I make it a habit to not use the word significant at all, in my teaching or my papers. I usually say “these data are unlikely under the null hypothesis” in order to make it clear that “significant” is always with respect to some null hypothesis. It also helps people avoid confusing p-values — which many people mistakenly believe mean “the null hypothesis is unlikely”.

    Tim

  19. #19 DrugMonkey
    May 7, 2008

    The problem with reliable is that it means something slightly different, which means you have to explain one more thing. And without the explantion, it’s even more opaque than “significant.”

    Huh?

    do you not think that “reliable” is more accurate than “significant” in describing a statistical result of p < 0.05?

    The problem with “significant” is that it comes with such a burden of everyday usage that you need to overcome. To say “well, yes I know you already use “significant” but we use it in this technical way to mean the probability is less than…”.

    It is in fact a good thing if you can come up with a term that is “opaque” if by this you mean the student does not already have an entrenched concept to which the term is attached.

  20. #20 DrugMonkey
    May 7, 2008

    MMcG: DrugMonkey, not wanting to start a whole new discussion, but the data don’t reveal anything either.

    Really? Than what does? I’m not a big fan of the term in any case but if one wishes to think that science is a process of revealing that which has remained hidden prior to a given investigation…

    unless you are splitting hairs over “the data” versus “interpretation of the data”. that seems a little precious given that the process of “looking at” or “understanding” or “appreciating” the data, in short getting it into your brain, is an inherently interpretive act.

  21. #21 Clark
    May 8, 2008

    I hate teaching statistics, in large part because it’s boring as hell, for both the instructors and the students, but also because students have a hell of a time grasping it, and that makes for some really painful interactions.

    One of my most memorable classes was a stats class. I think that’s because once a week he’d illustrate a point by bringing up a good paper and bad paper. (The bad papers were almost always psychology or sociology, unfortunately, which is why I have to watch my gut instincts when speaking of those) Anyways the paper would usually illustrate some point we were discussing. Tearing apart bad papers made the subject that much more lively and interesting. It also made a lot of the principles pretty memorable.

    But then one of my majors was mathematics and my other one was physics so I’m probably a bit off the typical view.

  22. #22 MMcG
    May 8, 2008

    DrugMonkey: unless you are splitting hairs over “the data” versus “interpretation of the data”. that seems a little precious given that the process of “looking at” or “understanding” or “appreciating” the data, in short getting it into your brain, is an inherently interpretive act.

    Yeah, thought this might bring us off point a bit, I should know better than to unstable my hobby horses. I am/em> making the distinction you call “splitting hairs”, but precisely because I don’t consider it hairsplitting. It’s much too common in Psychology in particular that people just accept their first interpretation because they think their data (or their stats) have “revealed” something – the very term obscures the “inherently interpretive” nature of empirical investigation.

    To bring this vaguely back on point, it does seem to be precisely this kind of confusion is causing the difficulty with students grasping statistical significance. In normal life significant things matter, in statistics they’re just unlikely. In the former case the role of the person’s own viewpoint is usually clear because there’ll be lots of individual variation to contrast it with, in the latter there’s a methodological culture in Psychology (I find at least) that think the stats somehow speak for themselves and their interpretation doesn’t involve a prior theoretical or personal stance.

  23. #23 Jon Rowe
    May 10, 2008

    Statistics are also part of Business studies (the division in which I teach) and I’m just glad this is something I don’t have to teach. As I tell my students, I’m more of a verbal person than a math (i.e., on all of the standardized tests, I’d score significantly higher on the verbal sections).

  24. #24 iddaa
    May 10, 2008

    The problem with reliable is that it means something slightly different, which means you have to explain one more thing. And without the explantion, it’s even more opaque than “significant

  25. #25 Jim Thomerson
    May 13, 2008

    I enjoyed some studies on survival of salmon put through hydropower turbines. Particularly the part where the estimated survival rate ranged from 95% to 108%. I like the idea that you can have more salmon coming out of the turbine than you put in. Environmental friendly; green to the max!

  26. #26 mjkane
    May 16, 2008

    If you’re teaching reasonably strong undergrads, I wonder whether they’d enjoy Abelson’s “Statistics As Principled Argument”, and whether you’d find the way he presents statistics as leading to a more interesting course for you, too. It’s really a wonderful little book.

  27. #27 Pete
    May 30, 2008

    Perhaps “Firm” vs “Nebulous”?
    I like those because they suggest a continuum. You could suggest analogies for various p-levels, from iron-clad through rubbery and spongy to fluff and hot air.

  28. #28 moda
    August 27, 2008

    How about “Better than nothing”? +1 :)

  29. #29 NG
    March 24, 2012

    I brought up this blog to get some further understanding on “Insignificant vs. Non-significant” as a student. The results were not significant :( as I will no continue my unreliable search…sighh