A lesson about correlation and causation

Besides yesterday being Mothers' Day yesterday, I had a lot of grant stuff to do, which means that this one will be a quickie. On Saturday, a reader sent me a link to one of the most useful sites I've ever encountered. I realize that over the weekend it's spread around the skeptical blogosphere like the proverbial wildfire, which is unfortunate (for me) given that I've made it a personal rule that I don't post on the weekend any more, barring amazing developments. Still, this one tempted me. It's a website called Spurious Correlations, and it is exactly what it claims to be. Its usefulness wil become apparent quickly.

One of the key arguments—if not the key argument—made by antivaccinationists is that correlation of the onset of the "autism epidemic" (i.e., the large increase in prevalence of autism and autism-spectrum disorders beginning in the 1990s) with the expansion of the vaccine schedule is strong evidence of causation. In other words, correlation equals causation. Of course, correlation can equal causation, but you need to have a lot of other evidence to show that. Most correlations don't equal causation, and that's where Spurious Correlations comes in. It has thousands of such spurious correlations, complete with graphs, numbers, and even correlation coefficients, many of which are well about 0.9, which is considered a very strong correlation indeed. For example, on the front page right now is a correlation between U.S. spending on space, science, and technology and suicides by hanging, strangulation, and suffocation, and the correlation coefficient (r) is 0.992082 (for a perfect correlation, r=1.0; no correlation, pure randomness, r=0; negative numbers mean a negative correlation in which one variable's increase is correlated with the other variable's decrease):

correlationspur

Amusingly, you can look for your own spurious correlations. I wanted to look for more spurious correlations to autism, but I couldn't find it. So instead I looked for things that correlate with per capita high fructose corn syrup consumption (because HFCS is considered evil in alt-med circles) and found it correlated with (among other things):

Clearly, if I ever go back to doing abdominal surgery I'll have to be careful about operating on people who consume a lot of HFCS, and people who consume a lot of HFCS need to stay indoors during thunderstorms. Meanwhile, HFCS demonstrated strong negative correlations with (among other things):

Hmmm. Perhaps there is something to all that HFCS fearmongering. After all, as HFCS consumption falls, look at how there are more lawyers, more bomedical doctorates, and less suicide by hanging!

Ah, you ask, but what doesn't correlate with HFCS consumption? Ask no more! Here they are:

I think that last one was messed up by the spike in deaths in 2005 due (clearly) to Hurricane Katrina. As for the rest, if you want to go hang gliding, work with agricultural machinery, or go canoeing or kayaking, drink those sugary drinks up!

Of course, the reason skeptics are suspicious of correlations is not because correlation never equals causation. It's just that we realize that there really are lots of spurious correlations like this, far more than the average person realizes. A person who hasn't taken the time to understand just how common such correlations are and how easy it is to mine various data to find them will find correlations compelling, particularly if they have a modicum of seeming plausibility, as the vaccine-autism link once did before so many studies demolished it. A skeptic, however, will realize that such a correlation is only a starting point that probably doesn't mean anything but might. Further evidence, in the form of testing other data sets, doing controlled experiments (when they are possible to do), and other means of testing are essential to determine whether what is observed is just a spurious correlation as opposed to a correlation that really does imply causation. We also know that confounding factors can easily lead to the appearance of correlation, such as steady increases in two different variables over time that just happen to occur over the same time frame but have nothing to do with each other, such as, for example, computer or cell phone use and autism. This sort of spurious correlation can lead to other correlations, such as increasing wifi exposure and autism, given that the increase in computer and Internet use over the last 20 years has led to the proliferation of wifi hotspots and increasing exposure to radio waves from cell phones. The list goes on and on.

I really wanted to go into the data and see if, for instance, autism really does correlate with vaccination or whether brain cancer correlates with mobile phone use. More importantly, I wanted to find a bunch of ready-made spurious correlations for those two conditions for me to use to demonstrate the principle that just because there's a correlation does not mean the relationship is causal, such as when I suggest that autism also correlates with Internet usage, home computer ownership, CD sales (at least until around 10 years ago, when CD sales started to take a nosedive and sales of downloadable digital music took off), and a variety of other conditions. Similarly, given that the incidence of brain cancers doesn't appear to be significantly increasing—actually, quite the opposite, it appears to be decreasing—the suggestion that cell phone radiation causes brain cancer seems no more convincing that the long-refuted claim that vaccines cause autism, and that's leaving aside the monumental physical and biological implausibility of the claim on the basis of simple physics alone.

Then I thought: Maybe that's the point. If this website had the data for the sorts of things for which cranks often confuse correlation with causation, like cancer, autism, asthma, and autoimmune diseases (to name a few), then the site would risk going from being a useful tool to teach critical thinking skills by allowing readers to explore and find the most ridiculous spurious correlations they can, thus demonstrating that correlation is not the same thing as causation, no matter how much the human brain likes to grasp onto such correlations, to a site that cranks can data mine to support their favorite pet hypotheses. That, of course, would be bad. On the other hand, it would allow people with me to demonstrate all the other things that correlate with autism prevalence, thus allowing me to ask antivaccinationists why they think it's vaccines that are the culprit rather than all the other things or that correlate with cancer, thus allowing me to ask why it has to be cell phones or any of the other bogeymen on whom cancer is blamed rather than all the other—heh, heh—spurious correlations. Of course, to cranks their correlations can't be spurious and must be strong evidence of causation, while all those other correlations are obviously nonsense.

In any case, I think it would be a fun exercise if you, my readers, would play around with the Spurious Correlations site and find the most interesting or bizarre spurious correlations for use in educating people in critical thinking. And, remember, the per capita consumption of cheese in the US correlates strongly with the number of people who died by becoming tangled in their bedsheets (r=0.947091). So, please, people whatever you do, don't eat cheese right before going to bed. Oh, and if you are unfortunate enough to be confined to a wheelchair, either temporarily or permanently, whatever you do, don't you eat cheese either!

More like this

I saw that website last night. I'm still grinning about the graphs. And it's amazing how strong the correlations can be. A lot were over 90%.

By Julian Frost (not verified) on 11 May 2014 #permalink

So, please, people whatever you do, don’t eat cheese right before going to bed. Oh, and if you are unfortunate enough to be confined to a wheelchair, either temporarily or permanently, whatever you do, don’t you eat cheese either!

I knew my aversion against cheese would be proven to be usefull someday.

My introduction to this was many years ago when I got my first "scientific" calculator (A TI-35, maybe?). You could put a bunch of points into it and get correlation, least-squares best-fit line through them, etc. No matter how random a collection of points I put in I could never get a correlation below ~.75.

Ever since, I've been very skeptical about "correlations" that weren't >.99. Now it looks like I have to be skeptical of those, too.

By The Very Rever… (not verified) on 12 May 2014 #permalink

The rise in numbers of diagnosed autistics most likely also correlates with the rise in numbers of: fuel efficient vehicles, airport screening personnel and legalised same sex marriages.

Autism rates increased as the number of required vaccinations increased (1980-2000- yes, that equals more mercury) but continued to rise even after these compounds were eliminated ( less mercury). Plot THAT one.

Autism rates rose alarmingly after Pink Floyd stopped recording. They also rose after the demise of Freddie Mercury ( less mercury).

Confusing correlation with causation is the hallmark of quite a few websites which we all know and *love*:
- today John Stone continues to perseverate abour Brian Deer
- and Jake Crosby harasses the Lancet's new administrator about un-retracting AJW's infamous ... study.
- TMR appears to recruiting new TMs as contributors.

The loudness of anti-vax groupies correlates inversely with the amount of time given to them by the media and correlates positvely with the number of years it's been since AJW as regarded as anything but a farce by most ...
Oh wait, those makes sense.

By Denice Walter (not verified) on 12 May 2014 #permalink

Pardonnez les typos , s.v.p.

By Denice Walter (not verified) on 12 May 2014 #permalink

Jake Crosby harasses the Lancet’s new administrator about un-retracting AJW’s infamous … study.

How does a journal retract a retraction without losing credibility? It implies that the journal was unduly pressured into either retracting the paper, or retracting the retraction. Either way, the truth starts to look political. Not that I expect Jake to understand this, but it should be self-evident to a journal editor.

By Eric Lund (not verified) on 12 May 2014 #permalink

I have used the global warming vs number of pirates graph from the Church of the Flying Spaghetti Monster's site to try and illustrate correlation vs causation. Unfortunately, the anti-science folks only read what supports their opinions and dismiss the rest as the gum'mint conspiracy to hold us in bondage to Big Pharma and so on.

By Helen Taffs (not verified) on 12 May 2014 #permalink

@ Eric Lund:

The cited e-mail exchange between Jake and Prof Wedzicha is exactly what you'd expect- priceless.

For some unfathomable reason, he maintains that she is " keeping it retracted for no given reason" DESPITE what her quite clear explanation.

Mark 'reading English' amongst Jake's skills in need of immediate attention

By Denice Walter (not verified) on 12 May 2014 #permalink

Helen@7 -- the hard-core denialists are so far down the rabbit hole of conspiracist ideation that no argument will persuade them.

All who follow this issue will rejoice in John Oliver's snippet, linked from HotWhopper today. And if you don't know the HotWhopper blog, you'll probably really enjoy it.

By palindrom (not verified) on 12 May 2014 #permalink

today John Stone continues to perseverate abour Brian Deer

I can scarcely imagine what thought-fragments coalesced to form this production:

"Who knows what is holding the court which originally said it would report back within six months ? ... To the best of my knowledge there is still no date named."

Not. How. It. Works.

For some unfathomable reason, he maintains that she is ” keeping it retracted for no given reason” DESPITE what her quite clear explanation.

I think we established some time ago that Mr. Crosby rejects our reality and prefers to substitute his own. In his version of reality, there is no valid reason for Wakefield et al. (1998) to have been retracted in the first place, and therefore cannot be a valid reason for keeping it retracted. It's a perfectly logical position if you take his premises as axioms, which for obvious reasons most of us don't. Therefore I don't expect him to understand the situation.

By Eric Lund (not verified) on 12 May 2014 #permalink

Eric, he is under the impression that the GMC findings "against the Wakefiled et al paper were overturned by the High Court" ( links to Mitting about Walker-Smith).

Jake started on the new ombudsman in April.

Seriously. he's working on a doctorate? HOW?

By Denice Walter (not verified) on 12 May 2014 #permalink

Mark ‘reading English’ amongst Jake’s skills in need of immediate attention

And writing it:

"The judge found only one misleading statement in the paper, but it was not because investigations undertaken were unethical experiments described as gaining ethical approval in the paper according to the now-overturned findings on which the paper’s retraction was based."

@ Narad:

Oh I know.

It appears that he has problems in assessing points of global import as well as their ramifications whenever he interprets material in this realm of inquiry.

He misses the central idea of Mitting.
What a clueless wonder he is!

By Denice Walter (not verified) on 12 May 2014 #permalink

I've been wondering how much weight to give this paper. It looks to me like the conclusions the authors report may be due to the Texas sharpshooter fallacy.

http://www.cumc.columbia.edu/dept/sergievsky/pdfs/contributionofvascula…

It's not that there isn't a plausible reason for cholesterol to be associated in some way with Alzheimer's Disease. The highest genetic risk factor (the APOE4 allele) codes for an apolipoprotein involved with transport of cholesterol and fats. But I'm aware of other studies which seem to show a benefit to high total cholesterol in old age with regard to AD and/or dementia, so I wonder how seriously I should take this contradictory study.

By Mark Thorson (not verified) on 12 May 2014 #permalink

Narad @13: I've read that quoted sentence several times now, and I'm still not sure what Crosby is trying to say. There was (according to the judge, who it goes without saying isn't qualified to evaluate the scientific merit) one false or misleading statement in the paper, and it wasn't the claim that the study had the required ethics approvals (which it didn't). At least one of those two statements must be false. He also seems to think that the retraction was based on the judge's finding, rather than the separate scientific investigation that had already taken place, and that the judicial finding in question applied to Wakefield, not just Walker-Smith (which IIRC was what happened: the appeals court found some technicality that let Walker-Smith off the hook). That's some primo word salad there.

By Eric Lund (not verified) on 12 May 2014 #permalink

“The judge found only one misleading statement in the paper, but it was not because investigations undertaken were unethical experiments described as gaining ethical approval in the paper according to the now-overturned findings on which the paper’s retraction was based.”

There is a moment in the movie "Dracula: Sovereign of the Damned" where the lead characters realize that they can pinpoint the location of Dracula, because if they plot on a map the locations where he's been attacking people, and join the points with lines, it forms the shape of a bat.

I mention this because I suspect that one might get a similar shape by trying to diagram that sentence of Jake's.

By Antaeus Feldspar (not verified) on 12 May 2014 #permalink

I mention this because I suspect that one might get a similar shape by trying to diagram that sentence of Jake’s.

I can never remember whether <pre> works.

(ROOT
(S
(S
(NP (DT The) (NN judge))
(VP (VBD found)
(NP
(NP (RB only) (CD one) (JJ misleading) (NN statement))
(PP (IN in)
(NP (DT the) (NN paper))))))
(, ,)
(CC but)
(S
(NP (PRP it))
(VP (VBD was)
(SBAR (RB not) (IN because)
(S
(NP
(NP (NNS investigations))
(VP (VBN undertaken)))
(VP (VBD were)
(NP
(NP (JJ unethical) (NNS experiments))
(VP (VBN described)
(PP (IN as)
(S
(VP (VBG gaining)
(NP
(NP (JJ ethical) (NN approval))
(PP (IN in)
(NP (DT the) (NN paper))))
(PP (VBG according)
(PP (TO to)
(NP
(NP (DT the) (JJ now-overturned) (NNS findings))
(SBAR
(WHPP (IN on)
(WHNP (WDT which)))
(S
(NP
(NP (DT the) (NN paper) (POS 's))
(NN retraction))
(VP (VBD was)
(VP (VBN based))))))))))))))))))
(. .)))

"The parser exhausted its search space limit (of 20000 passive edges); try non-exhaustive parsing or a shorter (or less ambiguous) sentence."

Narad @13: I’ve read that quoted sentence several times now, and I’m still not sure what Crosby is trying to say.

After extensive lucubrations, I have concluded that it's:

"The judge found the statement about ethics approval to be misleading, but not for the reasons stated in the GMC panel's decision, on which the Lance'ts retraction of the paper was based."

the lead characters realize that they can pinpoint the location of Dracula, because if they plot on a map the locations where he’s been attacking people, and join the points with lines, it forms the shape of a bat.s:

Clearly the inspiration for Borges' "Death and the Compass".

By herr doktor bimler (not verified) on 12 May 2014 #permalink

Graphical tree. Pardon me while I run this past a friend who specializes in this sort of thing.

Typo on the lightning correlation, a dash was hit instead of an equal sign. Normally I'd never comment on a minor typo, but it would imply negative correlation as-is.

"Deaths caused by lightning (r-0.805254)"

That site is a great find, very amusing to this stats geek.

Nice tree.
__________

Lucubrations sleeps with the fishes.***

***(It's a Sicilian message. But it's not directed toward anyone here.)

I'm somewhat alarmed by how many people die by becoming tangled in their bedsheets! Who knew? Reminds me of a show we watched once, Dead Like Me, I think it was called? It was troubling how many ways a person can die in a "freak" accident.

By Jessica S (not verified) on 12 May 2014 #permalink

There is a perfect correlation between giving a Space Shuttle a name beginning with "C" and catastrophic loss of the shuttle.

By Ivan Ilyich (not verified) on 12 May 2014 #permalink

Although it didn't seem to be present on the site, I've always felt that the relationship between autism prevalence and the length of basketball shorts has never been properly explored.

Typo on the lightning correlation, a dash was hit instead of an equal sign. Normally I’d never comment on a minor typo, but it would imply negative correlation as-is.

Nonsense. You're being pedantic, and regular readers know what I think of pedants who nitpick typos. (Hint: It isn't good.)

Ivan -- it gets creepier.

Not only did both C-named Shuttles perish, but NASA has a bizarre streak of bad luck in the month of January:

* The first crewed Apollo flight was AS-204, scheduled for February 27, 1967. However, a launch dress rehearsal with the capsule fully pressurized on January 27 resulted in a fire that killed all three crewmen.
* STS-51L, the final flight of Challenger, launched January 28, 1986 after numerous scrubs and other delays, and exploded 73 seconds into the flight killing all hands.
* STS-107, the final flight of Columbia, launched January 16, 2003 and was destroyed on reentry a couple of weeks later.

There have been a few other January flights that went off just fine, but even staunchly rational engineers talk sometimes of a "January curse".

By Calli Arcale (not verified) on 12 May 2014 #permalink

STS-107, the final flight of Columbia, launched January 16, 2003 and was destroyed on reentry a couple of weeks later.

In February.

Yes, but that actually makes it even creepier, Narad. If you look at the dates, January 27, January 28, and February 1 are within the space of just one week. Every year, NASA has a day of remembrance at the end of January or the beginning of February. It is perhaps somewhat convenient that the anniversaries of these deaths are so close to each other, but I'm told it makes for a very somber mood at NASA on that day.

By Calli Arcale (not verified) on 13 May 2014 #permalink

You are too nasty about friendly notes on typos. It isn't pedantic when the meaning could be altered. You come off like a spoilt brat. Yes, yes, it's YOUR bloggy and you can do whatever you want with it. Good for you.

Why not just say, "oops, fixed it". Many commenters correct their own typos, especially when meaning is affected (effected?--nah) as in when DW left the "g" off of "grapes"--although in that case the context was hilarious. Point is, she acknowledged it.

By rancidbrainmatter (not verified) on 13 May 2014 #permalink

This site is all over the web. I think the deeper story here is gullibility to spurious statistical 'facts': Do you really think over 800 people died from getting tangled in their bedsheets in the U.S. in 2008, increasing 150% since 2000? It cites the CDC as the source of this factoid. However the CDC website and downloads I searched for show nothing on the topic. If it were true such deaths would compose over 1/2 % of all accidental deaths, equal or exceeding the number of maternal deaths in childbirth and the number of deaths from firearm accidents.

There are many other dubious plots at that site, spreading faster than real news.
Oh well, such is statistics in the age of Karl Rove. As Steven Wright (?) said, "67.3% of statistics are made up on the spot."

By Sander Greenland (not verified) on 19 May 2014 #permalink