I'm all for scientific -- and statistical -- literacy, but sometimes the calls for it exasperate me. Just a little. Not significantly. If you know what I mean. Or you think you know what I mean. Anyway.
Yesterday Wired carried a piece by Clive Thompson, Why We Should Learn the Language of Data. It's a bit presumptuous to say that statistics is "the" language of data (data speak many languages and there are dialects within each), but I'll let that go. And I'm sympathetic when someone has to keep answering charges that if global warming was happening, why was there so much snow? This question was put repeatedly reporters to a physicist and climate expert who had to explain about short and long term trends again and again (although it seems to me that a better explanation is that climate change causes an increase in extreme weather events, but that difference isn't pertinent to this post). Thompson portrayed this Sisyphean task as the result of reporter illiteracy about data. That was no doubt true in some instances, but the real cause was their needing to have an authoritative counter-quote from an expert in the face of the right-wing's anti-science noise machine. But Thompson wanted to make a point that people are data-illiterate, so he goes on:
Or take the raging debate over childhood vaccination, where well-intentioned parents have drawn disastrous conclusions from anecdotal information. Activists propagate horror stories of children who seemed fine one day, got vaccinated, and then developed autism. Of course, as anyone with any exposure to statistics knows, correlation is not causation. And individual stories don’t prove anything; when you examine data on the millions of vaccinated kids, even the correlation vanishes. (Clive Thompson, Wired)
Readers here know I'm not an anti-vaxer, so my disagreement here won't be on that basis. It's the problem that for too many people a little knowledge of statistics - "any exposure to statistics" as he put it -- is a dangerous thing. I'm not going to argue that anecdotal evidence is sufficient (although sometimes it is: when a baby strangles because the crib's slats are too far apart, that's anecdotal and it's enough to know the crib is dangerous). Instead I'm going to take issue with: correlation is not causation.
It's a commonplace. People know this, and it's repeated all the time: correlation doesn't mean causation. And it makes me crazy to hear it, not because it isn't true, but because it is close to being vacuously true. No empirical evidence, including correlation, is enough to establish causation. Causation is a judgment about an association, not a discoverable property of a relationship that can be empirically demonstrated, as if some associations had readable labels on them that said "causal." It would be nice if they did. But they don't. "Causation" is an inference from empirical data. It must be based on empirical evidence, but it cannot be reduced to it.
So how do we make these judgments? Unfortunately there is no magic test. Barrels of ink have been spilled by philosophers trying to sort this out. Most scientists don't worry about it but have fairly simple, perhaps naive, notions of what it means for "A to cause B": If, whenever we change A we see a change in B, all other things being equal, we say that A "caused" the change in B. This is not just correlation but active intervention on the part of the scientist when he or she purposely changes A and observes what happens to B, all other things being held constant. Of course whether all other things are held constant is a prickly issue. They are never exactly the same, and we have discussed this from a different point of view in our (incomplete) series on randomization and counter factuals (see here, here, here, here, here, here]
, here).
The hoary "correlation does not equal causation" comes up quite frequently in epidemiology in the form of the cliche that epidemiological studies cannot establish causation. Apart from being utter nonsense in the usual sense, in a more restricted sense it is true, but trivially so, since no science can establish causation merely with data. It requires a theoretical framework (possibly contested) and judgment on top of that. Epidemiologists sometimes quote a list of characteristics of associations that are more likely to have the magic property of "causal," first made by an eminent statistician, A. Bradford Hill and mistakenly called the "Hill criteria" or "Hill postulates." I say mistakenly because Hill never called them either, referring to them as "viewpoints." In any event, their use as a checklist in epidemiology to "establish causation" from data is a serious error, a view well expressed in one of the leading textbooks of epidemiology, Rothman and Greenland's Modern Epidemiology 2nd ed. (Little Brown, 1998, p. 24):
If a set of necessary and sufficient causal criteria could be used to distinguish causal from noncausal relations in epidemiologic studies, the job of the scientist would be eased considerably. With such criteria, all the concerns about the logic or lack thereof in causal inference could be forgotten: It would only be necessary to consult the checklist of criteria to see if a relation were causal. We know from philosophy that a set of sufficient criteria does not exist. Nevertheless, lists of causal criteria have become popular, possibly because they seem to provide a road map through complicated territory.
Understanding that correlation and causation aren't the same isn't sufficient. That kind of superficial "understanding" may itself be misleading. The problem is much deeper in a way not addressable by calls for statistical literacy. I agree that more literacy in statistics is a good thing.
But it doesn't guarantee real understanding. Even from people who are highly literate in the language.
- Log in to post comments
And here I was thinking that the error wasn't correlation/causation, but a much simpler logical fallacy: post hoc ergo propter hoc.
One doesn't even need to know anything about statistics to understand that fallacy. All one needs is a sufficiently ridiculous counter-example: "I ate eggs for breakfast, and then shoes came untied! Eating eggs loosens shoelaces!"
caia: post hoc is just another version of "correlation doesn't equal causation." If I told you that whenever you sat at the kitchen table, your cat played with your shoes, then your scenario wouldn't look so ridiculous. The mere fact of one following the other isn't enough to say one way or another. Causation isn't given by empirical data. It's a judgment.
"x occurs after y" is clearly not just another version of "x is correlated with y." To appreciate this, consider that "x is correlated with y" is compatible with "y is correlated with x," whereas "x occurs after y" is clearly not compatible with "y occurs after x."
I was expecting you to point out that "correlation is not causation" is irrelevant when there is no evidence of correlation.
It drives me crazy too. I see self-styled "smart people" trot out "correlation isn't causation" all the time as an excuse to automatically dismiss the findings of any study they choose, then congratulate themselves on how insightful they are for seeing through shoddy scientists who think they can prove things. It doesn't matter what is controlled for, whether there is other evidence or theoretical support for the idea, how carefully the experiment is designed, etc. "Correlation isn't causation, ergo the study is bunk".
"Correlation isn't causation" is supposed to be a cautionary scientific phrase, but it's often used in an anti-scientific way to suggest that no amount of evidence can have any bearing on supporting causal hypotheses. No, you can't deductively prove causation, but that's different from whether you can reasonably support it.
Yes, but the point is that that one, whatever you call it, doesn't require any familiarity with what statistics does. Other errors do require that you venture out from classical logic into mathy territory.
Vicki: You can have causation without correlation if bias is masking it.
I read somewhere that drinking wine is correlated with good health. Clearly I will not invoke any of that "correlation is not causation" nonsense here.
@bob koepp:
Your analogy is fatally flawed because you are oversimplifying correlations to make your analogy.
First off, not all correlations are bi-directional. This means that correlation is not a reflexive relationship, as you are attempting to imply.
Secondly, 'occured after' is never reflexive, whereas 'is correlated with' is sometimes reflexive.
This is just a lousy analogy.
Finally: analogies aren't arguments. They illustrate arguments. Make the argument, not the analogy.
Unfortunately, there are people who play on the ignorance of statistics by the general public to show correlation and imply causation without explaining in a very convincing manner why they believe there is causation.
A perfect example is AGW, where frequently we are hammered with the observations that CO2 is increasing and so are temperatures, while mans CO2 emissions are increasing. Therefore, mans CO2 is reponsible for CO2 increasing and the 0.7 deg C temperature rise over the last 120 years, and models predict temperature increaes of 1.5-4.5 deg C (some range, huh) with a doubling of CO2 by the end of the century, while the same models 10 years ago could not predict the absence of warming over the next 10 years.
Obviously, the science behind the hypothesis is much more complex, yet the average person can not understand it, let alone judge if it is believable or not. So they are given the correlation argument and an explanation of the science that should not convince a junior high school student given the assumptions and uncertainty in the data, not to mention our woeful understanding of climate processes, not to mention the failure of models to predict anything with any accuracy. Even their curve fitting exercises of past data fail to explain the cooling from the 50's to 70's and the cooling from around 1900-1920's.
Statistics have been used on the gullible public to mask the uncertainty of the many hypothesis on causation being presented as fact by scientists, economists and government regulators.
It has also been used to deny causation by citing a lack of corellation in studies designed not to find any. Why do you think it took so long for lead and asbestos, and other pollutants to be regulated. They are also used to minimize the risk of BSE, food contaminants, vaccines and drugs. The phrase "We have found no evidence to suggest...." is repeated ad nauseum.
Remember the other old adage, the absence of evidence is not disproof. Sometimes you have to use reason even if the statistics can not support it due to SSS, or even worse, the lack of funding to find the evidence that may be harmful to a certain industry.
The public needs to be reminded, and some are awakening to this already, that statistics is not science, it is simply a tool of science. If science can not be explained without statistics, the science is poorly understood, or worse.
pft: Not at all true. AGW is supported by extensive cross-linking science of all kinds. It is not based on "correlations" but with theory and empirical data. That's the way you argue causation.
Revere, agreed on your main point, but when you say causation is/(means?)correlation plus theory plus judgment, what do you mean by judgment beyond application of theory? The particular point at which one takes X proportion correlating plus theory and say "aha, a cause!"? Or--?
Paula: Judgment enters in several ways. Theoretical choices, how one assembles the evidence (what one includes or doesn't and why) and how one weighs the included pieces. This then produces a narrative or coherent description, one species of which is a causation judgment.
A longer explication of these issues is in "Environment and Health: Vital Intersection or Contested Territory," in the American Journal of Law and Medicine, vol. 30, pp 189-215, 2004. We haven't even gotten to the issue of "statistically significant" associations, a term typically used to buttress causal inference, but which is routinely calculated in non-randomized settings where the frequentist statistics do not apply. And on and on it goes. . .
Sam: I know that article well and agree with it. Thanks for reminding us.
Brian Lynchehaun - You say, "Secondly, 'occured after' is never reflexive, whereas 'is correlated with' is sometimes reflexive." -- which is sufficient to establish that they are distinct. QED
Your most notable contribution, here, Revere (and I agree with you, by the way), is not the quality of your powers of explication, which are of course indispensable to the art of sound instruction, and your capacity to render even a discussion of this material largely intelligible to your target audience; your real contribution here is in pointing up the fact that no amount of instruction will ever replace, or even enhance, the necessary capacity of the individual to accommodate this information; you cannot compensate for an innate inability to conceptualize. There will always be a breakdown, at some unavoidable point, between "information" and the ability to fully internalize that information (full internalization, in my opinion, always implies the ability to then go beyond this point, and to further incorporate and employ this information, to transcend it intellectually). That is why there is such an abundance of lockstep Sarah Palin devotees swarming over vast expanses of the US political landscape, at any given moment. You are not going to get any appreciable level of statistical literacy inculcated in a population that has proven, relentlessly, that it not only does not embrace education, but actively resists it, with formidable effort, at every opportunity. The last figure that I recall seeing (I don't remember the source of the poll) suggests that roughly 60% of the adult population in this country believes that AGW is a fraudulent product of a scientific community that is attempting to undermine all that is of genuine value in their lifestyle. You have intellectual indifference (at best) compounded by profound emotional resistance to change. Clive Thompson is probably well intentioned, I suppose, but he is also clearly delusional if he really believes that it is in any way whatsoever possible to make even a discernable dent, here.
Couldn't resist to land here Revere.
To me it sounds like almost a pure Cartesian debate.
First Nations as many Oriental Cultures have a different perception and coordination about events than cause and effects.
I like to give an example to make my point, in the Northern Hemisphere Snowy Owls eats Lemmings while in Southern Argentina and Chile, huge Lemmings eats Snowy Owls, this is a fact.
I am sure that this Nature autoregulation was not part of Darwin or Descartes concerns, but in the Nature we do have so many examples of this.
Despite my bias, I still recognized that tactically Cthe Darwinist or Cartesian approach is quite worthwhile but IMO, they are in the ditch concerning systemic behaviours.
If they where more 'open Mind Scientist' like you, we would have at least the chance to insert new dimension in the debate, Science would not loose it's deliberative powers and Nature its Authority.
Amen at 13, Revere. I trot that out sort of line out frequently when colleagues attest that since I use statistical models I don't do anything 'Theoretical' as well as when students write maddeningly of their intent to 'proof' something by using statistics. But perhaps not as clearly as you just did. The whole point of getting your narrative together is that you have a testable hypothesis and have made explicit what evidence you have accepted or are willing to accept in its evaluation. Ergo it becomes testable (again, and again, and again) and revisable.
Thanks for the article Sam - I will look into it!
correlation usually gives evidence for causation.
How much evidence is needed for "establishment" ?
Use the language of numbers.
Use subjective estimates which you may rename as "judgement".
Relevant: http://xkcd.com/552/
Amen
I always counter with: cause implies correlation.
Some correlations can become compelling enough (well-replicated, known third variables controlled for, yet the correlation still exists) that I think the burden of proof should shift to those claiming non-causality.
In other words, far more useful than claiming correlations doesn't imply cause would be to identify the third variable that explains the relationship. Without the third variable, I agree the saying is trivial.
"mistakenly called the 'Hill criteria' or 'Hill postulates.'"
Uh... it's not mistaken to cause it a hill postulate. It is named AFTER him, of course he didn't name it after himself. Don't be a dork. You are mistaken in calling it anything else. In fact this is a great example of the author making exactly a mistake of correlation by assuming that it is a mistake to call this "Hill's Postulate" on the basis that it came after Hill's paper. It is called Hill's postulate precisely BECAUSE it was named after. This is why the author comes off as arrogant.
Nobody believes correlations have NOTHING to do with causation. People use Correlation is not Causation to say that it is not enough, as in common conversation the word "sufficient" is meaningless to most people. The actual phrase is "Implies" which of course in common parlance is nearly the opposite of how it is used scientifically.
The Wired article is speaking plain english, not being a nit picky full-of-himself "expert" who could have summarized this entire rant as "Correlation is not enough to say there is a cause, but if there is a cause there will be a correlation. In fact looking at correlations often makes us come to false causative conclusions". i.e. Exactly what they say in WIRED.
Nerdbert: you are not correct. Hill had an appropriate name for them, "viewpoints." My comment was not about the eponym but the word "postulate," which is incorrect. "Postulate" requires necessity, which Hill''s viewpoints do not. Simple as that.
actually the above work is good
"caia: post hoc is just another version of "correlation doesn't equal causation." If I told you that whenever you sat at the kitchen table, your cat played with your shoes, then your scenario wouldn't look so ridiculous. The mere fact of one following the other isn't enough to say one way or another. Causation isn't given by empirical data. It's a judgment." -reverie
If the cat played with your shoes whenever you sat at the table, you would very likely feel the pawing at some point in time. If you looked under the table you would observe the cat in the act of untying your shoelaces, and discover the empirical evidence needed to draw a conclusion on causation. However, if the cat merely walked under the table every time you sat down, it would be an injustice to implicate poor kitty in the shoelace mystery. This how the simple concept applies.