Today's falsehood: Correlation Implies/Does Not Imply Causality

This post has been moved HERE, please go check it out.

More like this

[i]And THAT is patent nonsense, isn't it? The stock market goes up all the time. Every year one has a birthday. They are correlated. There is a causal, though trivial, connection: Time goes in one direction.[/i]

No. Whenever we say that there is a causation, that causation is between clearly defined things that we're observing.
If we say that there's a causation between a person's birthday and a stock market movement, the causation must be between those two things, that is someone's birthday is CAUSING the market to move in a certain direction. Without that person's birthday at that time, the market wouldn't have moved that way.
We can't make random associations and suggest that there is a causation - that's a logical fallacy!

Yeah, some people, especially from social "sciences" and from a few more scientific fields but where real world experimentation is limited or impossible (eg climatology, paleontology, archeology) , will always be extremely uncomfortable with this indisputable reality that correlation does not IMPLY causation, and that correlation by itself or any kind of statistical data NEVER proves anything.

About warming, sure. Most health reporters should get the phrase tattooed on the backs of their hands, so they can see it the next time they start to write a story about how this or that prevents cancer.

I think there's a certain type of moderately smart person that learns they can parrot phrases like this for some kind of gain without truly understanding what they mean. Maybe it's because of encouragement from peers or a general feeling of intellectual superiority. I don't know. I'm sure I'm guilty of the same behavior in other cases though :)

I wonder, though, when "implies" became part of it. I had always heard it as "correlation is not causation" until relatively recently. "Correlation does not imply causation" is clearly false in general, while "correlation is not causation" is clearly true in general (but not always useful as you point out).

Correlation can suggest causation. Sometimes it has nothing to do with causation and is just coincidence. But one thing is for sure, correlation by itself does not prove causation.

When somebody reports that A and B are correlated, there are four possibilities:
1. A causes B.
2. B causes A.
3. A and B are not causally related to each other, but some other factor C causes both A and B.
4. A and B are not causally related to each other, and the apparent correlation is a statistical fluke.

You can check the last possibility by repeating the test on a larger sample and seeing whether the correlation holds up (you can make the probability of a statistical fluke ridiculously small, although you can never make it go to zero). Proving any of the other possibilities cannot be done with statistics alone. For that, you have to have some kind of model which determines which of the first three categories you are in.

For instance, if hail is falling at your location, there is almost certainly lightning in the vicinity. But this is not because the hail causes the lightning, or vice versa; it is because a severe thunderstorm, which causes both of these things, is passing through. Thus it is correct to say that the correlation of hail and lightning does not imply that one causes the other. It is, however, a strong indication that the phenomena are related.

By Eric Lund (not verified) on 20 Jun 2011 #permalink

I disagree. This is largely semantics. You want "equals" instead of "imply" - fine. Equals is much stronger. But you premise is still wrong. Imply = strongly suggests, and correlation doesn't strongly suggest. Just because correlation can equal causation, doesn't mean it suggests it in any way. You are falling into the trap yourself. At best it introduces the possibility.

Every year, on my birthday, the stock market goes up. My birthday doesn't cause the stock market to go up (equals).

Nor does it imply that that my birthday causes the stock market to go up. You can have the strongest correlation possible, and it will have nothing to do with causation.

By John Prof (not verified) on 20 Jun 2011 #permalink

"Imply = strongly suggests, and correlation doesn't strongly suggest."

Imply in formal logic and probably in science does indeed equal strongly suggest. In English, however, it might mean strongly suggest and increasingly means to state something knowing it is not true, or some other weaker version of what it could mean. I do rather prefer to talk about the meaning of a falsehood (and its false-ness) in the context in which it is actually used. But yes, the most common wikipedia version of the phrase is "correlation does not imply causation" and the word "imply" means to indicate a truth (strongly) indirectly.

So no, my premise is not incorrect ... it is correct ... and I've not fallen into a trap. I'm speaking here of pedantry. Glad you showed up!

Every year, on my birthday, the stock market goes up. My birthday doesn't cause the stock market to go up (equals).

Now, you are condescending. Unnecessary, though if it gives you pleasure ....

you can have the strongest correlation possible, and it will have nothing to do with causation.

And THAT is patent nonsense, isn't it? The stock market goes up all the time. Every year one has a birthday. They are correlated. There is a causal, though trivial, connection: Time goes in one direction.

Also, your stock market/birthday example is a poorly constructed one; Apples and Orange and all that. Come back when you've got an apples to apples example and multiple repeated strong correlations with good p-values and tell me about how there is no causal link of some kind.

Trap indeed. Indeed.

Eric, yeah, I'm going to add something like that into the next version of this post.

Oh, and John, yes, it is largely semantics. You totally got that part.

Two variable might show a very strong correlation because they are linked to another variable. A classical example is the correlation between flush toilets and heart attacks in rural _____ (name your developing country). Do flush toilets cause heart attacks? No. Do heart attacks cause flush toilets? No. However, both are also correlated with affluence and especially the diet and sedentary lifestyle that goes with it. So what is the causative factor here? Is it affluence, or diet, or lack of exercise, or economic development, or something else (perhaps a virus that causes arterial lesions that is more likely spread in denser urban populations, perhaps via municipal water systems contaminated with sewage...from flush toilets. To me, safest thing to say about correlation is that "correlation implies correlation" and leave it at that. Causation is much more reliably verified/falsified by controlled experiments, either intentional or accidental, in which one can compare the effects of manipulating a variable with the effects of NOT manipulating that same variable.

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.
-- Randall Munroe

I mean, really, nobody is thinking that a mere statistical correlation means that two sets of observations have a definitive causal link.

I'm glad that I've just been imagining the antivaccine crowd all these years.

By D. C. Sessions (not verified) on 20 Jun 2011 #permalink

The big problem is that we can't observe causes. The term 'cause' describes a relationship, not a stimulus. Yet I've found a lot of people have a hard time with this concept (including, to be honest, even myself).

Our brains do better assuming that high correlations are synonymous with causations, taking a shortcut to conclude with confidence that a particular observation is the same as a cause until this relationship fails and a better one comes along. It's a pretty useless thing to describe all events as merely series of unconnected occurrences, after all.

Mike, good point. One way to help with this is to square the r to get R-squared, and pretend like that's a percentage. If your model is good, saying "10 percent of the variation in X is caused by variation in Y2 (for example) has more meaning than saying that "there's a high/low correlation."

So, when the apple falls from the tree to the ground, what is the cause? Gravity? Or whatever effect detached the apple from its branch?

Although I completely understand your frustration with the misuse of this 'rule', this post leaves me scratching my head for two reasons and the comments just make me want to bang my head on my desk. Not because they are outright wrong, but because most are so convoluted (one of the things that made me scratch my head about the post) that I can't imagine the audience you were hoping to reach actually reading it.

So, first, maybe due to frustration, your attempt at explaining the frustrating problem was convoluted and many statements in it (e.g., "Your correlations imply causation.") would be simply wrong if taken out of context. I think that you're trying to explain causal inference at a depth that requires a lot more than a blog post to understand. Mike's points are well-taken in this regard.

But you could explain the oversimplification problem that frustrated you to begin with much more simply than you have here and in a way that is much more defendable.

Logically, correlation is not sufficient evidence to infer cause, but it is necessary to infer cause.

In other words, all causal relationships are correlational; not all correlational relationships are causal. In other words, correlation alone does not suggest, imply, prove, or equal cause, but without it, we have no evidence that a causal relationship exists.

I hope that is as clear as I think it is. If it is not, I'd like to know so that I can work on it.

But then there's the other thing that bothered me about the post:

I mean, really, nobody is thinking that a mere statistical correlation means that two sets of observations have a definitive causal link.

Are you serious? I'm thinking that you must have few friends or family with IQs below 120 if your experience tells you this. Mine, and the research on reasoning, says otherwise. Most people don't recognize that they have made the assumption, true, and many people misunderstand the 'saying' as you noted, but assuming cause from correlational information is one of the most common forms of overgeneralization that people make.

I think that you're trying to explain causal inference at a depth that requires a lot more than a blog post to understand.

I am definitely NOT trying to explain causal inference at any level at all.

In other words, all causal relationships are correlational;

Not necessarily. Chaotic and emerita systems may not be, for instance. Anyway, your description of how causality and correlation may work is fine, but it really does not speak to the post. I think your frustration with my frustration is that you were looking for, or wanting, something that I never put there.

I can summarize this post very simply: When you hear people say "correlation is not causation" (in one form or another) it is often a denialist ruse, paternalism, or some other sort of distraction from the issue at hand, and the person is generally speaking out of their nether regions.

I could have said that sentence instead of writing the entire post, but there are sometimes reasons to expand on a point.

Are you serious? I'm thinking that you must have few friends or family with IQs below 120 if your experience tells you this.

Again, I think you are looking for something more than intended. Let me rephrase: A statement about a causal link between things made in reference to a statistical observation is not a statement about the statistical observation in the absence of any context that may have led to the discussion, or the conclusion. I think maybe you missed the word "mere."

Thanks for your comments, though! As I noted in a comment, I had originally intended to have a part of this post address correlation and causality rather than the aphorism and its uses and meaning, but I didn't. And, with your comment and others there is a good starting point for that discussion.

Not necessarily. Chaotic and emerita systems may not be, for instance.

When variables are not isolated, correlations may appear to be nonexistent, but causes are always correlated with effects - by definition. This is what makes the topic so difficult to discuss. The caveats pile up in practical use.

Did I say "emerita?" Damn autocorrect. I wonder how many people went to wikipedia to figure that out..

"Correlation does not mean causality"?

"Correlation does not imply causality"?

"Correlation does not infer causality"?

"Correlation does not prove causality"?

"Correlation does not lead to casualties"?

I prefer my own:

Correlation may suggest causality and that further investigation should be done.c

Unfortunately, it's not as succinct or as pithy.

.

Greg,

I deal with this all of the time in my work as well. I work in researching curriculum and instruction. Often we try to look at some novel instruction in a classroom and look at student understanding of the concepts in the instruction.
When we deal with those "Correlation does not imply causation" folks, it is usually someone with either an alternative instructional method that has not panned out or it is an administrator looking for an an out to say we can't afford to teach the students science in this manner. Agenda is often the bane of anyone doing research on living systems.

I generally say, "Correlation does not equal causation."

I agree, as you point out, that seeing a correlation in your data automatically makes you think you have found a cause. A scientist will think, "Hmm, perhaps this correlation is due to causation, let me prove/disprove that with another experiment."

However, the general public (thank you news media) automatically jumps to correlation equals causation. Therein lies the rub.

I mean, really, nobody is thinking that a mere statistical correlation means that two sets of observations have a definitive causal link.

Maybe not in your statistically literate circles, but that claim is made frequently in every day life. Often based on the flimsiest of correlations.

Logically, correlation is not sufficient evidence to infer cause, but it is necessary to infer cause.

Correlation does not prove causation, but lack of correlation does prove lack of causation.

Francis Bacon said, "The general root of superstition is that men observe when things hit, and not when they miss; and commit to memory the one, and forget and pass over the other." I say coincidence does not prove causation. That works for me in my simple world.

This is hogwash. Correlation alone does not even imply causation. To imply or to suggest X is to say it is true without saying it (i.e. to imply it is true). Since there is no relationship between just correlation and causation, no implication or suggestion exists.

It is worth noting that in my above comment, I state "correlation alone" for a very important reason: It is possible to construct a test whose measurements of correlation do imply causation.

Yeah, but did you read anything other than the title of the post?

You seem to have completely ignored the issue of a mechanism. If you can posit a mechanism by which phenomenon A can bring about phenomenon B, doesn't that give much stronger substantiation to the causal relationship between A and B?

By Lois Matelan (not verified) on 05 Jan 2014 #permalink

"Gravity causes the apple to fall to the ground instead of sideways when it detaches from the tree "

Is Gravity "our" explanation for apple falling or did it really "cause" the falling?

How about: "correlation does not necessitate causation"? That doesn't imply cheap-shot evasion of that it so often does.

By Neil Bates (not verified) on 15 Apr 2014 #permalink

Neil, that seems reasonable other than the possible confusion about causality of the causality.

The original problem comes from the use of the word "Imply" in logic vs. "Imply" in vernacular parlance. Like the word "Theory" in science vs. the vernacular they are almost opposites. I imply things in my daily speech that I'm trying to avoid saying, sometimes because I can't really say it. A implies B in logic because if you see A you should bet on B.

The dismissive nature of the term "correlation does not imply causation" as used on the Internet is not only inane, it is not worth the effort it took to type it.

Pointing out smugly that "correlation does not imply causation" is incessant on science forums. We see that comment, and cringe, as it is clearly degrading to people that found correlation and want to continue along with the research until they prove causation, or their theory disproves itself.

It is nauseating to read comments constructed by mental midgets disparaging epidemioligists, statistical analysts, herpetologists, and world renowned disease control scientists with an obscenely misused phrase like "correlation does not imply causation". Save your flimsy knowledge of statistics and science to impress the freshmen during frosh week.

I think Greg made the point well. To put in my own words: correlation does not *necessitate* causation, but it often does, especially when combined with theoretical reasons to suspect that. Also, remember that technically, in philosophical logic it is a "fallacy" if it is not a necessary logical conclusion, even if very often the case (such as, where there's smoke, there's fire - since smoke could come from another source, even bottled up previously etc.) The whole point is to "keep in perspective" and not get hung up on either certainty or be dismissive.

By Neil Bates (not verified) on 23 Jun 2014 #permalink

"...as it is clearly degrading to people that found correlation and want to continue along with the research until they prove causation."

But you can never "prove" casuation. You can increase confidence in your hypothesis, but you will never reach 100%, no matter how much research you do.

You can't prove causation because it can never be observed. Even in the case of the apply and gravity mentioned in the blog, we can't be 100% sure that gravity caused the apple to fall. It's possible that one day an apple might shoot sidewards instead of falling; it's possible that that might already have happened, but no-one saw it.

To prove casuation, you'd have to be omniscient; and we're not.

Correlation does not imply causation means there are possibly other unknown/uncontrolled variables that may be causing the effect you are witnessing. Causational experiments have removed all other possible variables while correlational experiments have not which explains the weaker statement and inability to infer causation.

Brett, that's a possibly correct post hoc explanation but usually does not apply. Most of the time the conversation is not about experiments at all. Correlation doe not suggest uncontrolled variables at all; in a very large number of cases, with high correlations, it would be wrong to assume that there are. So-called "causational methods" don't so anything to identify cause. Having a high correlation and an excellent understanding of physical causality for a system does not get better by using "causational experimentation" which is little more than normal experimentation where you've made explicit statement about cause. Finding a correlation between two variables by surprise provides only weak suggestion of an underlying cause. For this reason, the only real difference between an approach that is "correlational" and one that is "causal" is the level to which one argues against a proposed cause on the basis of a prior incredulity. That is rarely helpful or impressive.

So no, that's not what it means, not where the term comes from, or how it is ever used except in a few rare instances, and the distinction between correlational and causal (the latter only used in some subfields but with analogs elsewhere) is not a novel experimental or statistical technique, really, but an approach to structuring analysis, which still requires understanding of (and proof of) mechanism.

Greg - I've always known the phrase as, 'Correlation is not causation.' That's a fairly straightforward statement in my mind, basically saying, 'Beware of jumping to conclusions.'

The useage you cite, 'Correlation does not imply causation.' seems quite wrong. It seems obvious to me that correlation *does* imply causation - but it doesn't prove it.

Spurious correlations can be quite fun though :)

By Kevin ONeill (not verified) on 27 Sep 2014 #permalink