Not all causal relationships are created equal

You might have already see this chart relating obesity to time spent eating in The New York Times:

i-65fd5fb93633486315037b419b5afaf0-foodfat.png

The commentary accompanying the chart goes like so:

On Monday, in posting some of the data from the Organization for Economic Cooperation and Development's Society at a Glance report, I noted that the French spent the most time per day eating, but had one of the lowest obesity rates among developed nations.

Coincidence? Maybe, maybe not.

Jim Manzi dug deeper into the data and found something very interesting:

I recreated the original analysis (minus the inclusion of the OECD average as a data point in the regression, for what I assume are obvious reasons). I get pretty much the same picture, and using a log regression form, get what looks to be the same trend line. The R-Squared on the regression (not noted in the original post, as far as I could see) is 26%. Without the U.S. and Mexico, it goes to about 6%, and becomes statistically insignificant.

But what was really interesting is that there are five other time categorizations provided at the source website. Here's the same data plot, but using "Time Spent Doing Unpaid Work" instead of "Time Spent Eating and Drinking":

...

Huh. This relationship, produced from the same data source, is about twice as strong (R-Squared = 52%) as the one that was reported. It took me literally five minutes of work to discover it. Why do you think that one was reported but not the other? This appears to be a textbook example of the human tendency to accept correlations as "not definitive, but part of the overall picture of evidence for causality" when such data serves to confirm pre-existing beliefs, and to ignore it otherwise.

R-squared here refers to the proportion of variation of Y explained by the variation in X. It is a problem of dredging through data that you selectively pick out relationships of "interest" and dismiss those which you don't want to highlight as of less interest, or simplifying the "underlying complexities." More generally, it is always an interesting verbal experience dealing with someone who is the king of nuance and subtly shadings when they are making a negative case against a hypothesis, but become forceful advocates of black & white inferences when making a positive argument.

Categories

More like this

OK, so my mind isn't as great as Stephen J. Gould's was, but when The Bell Curve was first published, I remember looking at the data appendices, and thinking, "These data are crap." A few years later, I found an essay by Gould in The Bell Curve Wars that made the same point, albeit more eloquently…
As you know, I’ve been running a model to predict the outcomes of upcoming Democratic Primary contests. The model has change over time, as described below, but has always been pretty accurate. Here, I present the final, last, ultimate version of the model, covering the final contests coming up in…
I've made my first stab at a prediction for the electoral college outcome for the US Presidential race, 2016. I use a roughly similar methodology as I did to accurately predict most of the Democratic primaries. However, since primaries are different from a general, the methodology had to be…
A couple of smallish items that came up in recent days, that can be grouped together under the general heading of "data presentation oddities." First, over at Crooked Timber, Kieran Healy tries out a semi-hemi-demi-log plot for a graph of WPA expenditures. The problem he's trying to address is the…

Well, I think we should consider two main aspects:
- WHAT we eat,
- and HOW we eat.
Those countries above the line have mostly bad eating-habits as consuming junk food (not to mention great amounts of it...) as well as eating irregularly. Moreover a meal is not chewed properly, what results in digestive problems.
Another problem is practicing sport, but it's another pair of shooes.
What surprised me is the correlation between obesity and "Time Spent Doing Unpaid Work".. I wonder what is the background of it?

"R-squared here refers to the proportion of variation of Y explained by the variation in X." But only in the technical, statistician's sense of "explained", not in the sense of the philosopher or the man in the pub.

By bioIgnoramus (not verified) on 11 May 2009 #permalink

Im a pub philosopher and get the meaning of R squared perfectly clearly. Can biolgoramus tell us what some other men or women in pubs, philosophers or no, see (or not) in this clear relationship?

Apropos the first poster, I have direct experience of the eating styles and substance in the UK and Japan.

In the UK we generally sit down to a one-plate meal, usually consisting of meat & 2 veg. Meat can be mince, sausages, chicken, pork, beef - in descending order of frequency (in my experience. This is occasionally 'spiced up' by having a nice curry/Chinese instead.

In Japan we sit down to a meal that is a mix of things: bowl of rice, miso soup, cooked fish, seafood, a variety of cooked vegetables - some being battered (tempura-style). Chicken, pork and beef make occasional inroads into this type of meal. Alcohol is usually consumed throughout by the males of the household - usually spirits, but often beer.

In contrast to the UK, workers usually work very long hours, smoke a lot (though this is decreasing), and have a tradition of getting up early - often to perform household tasks that cannot be accomplished at what would be a normal time for a Westerner due to the length of the working day.

Or to paraphrase: that graph really is a pile of shite

Because in France we do not eat the same shit! Has some shit rich in Omega 3, some decaffeinated shit, some shit without phosphate, some organic shit, some shit with no added sugar, there etc.

"Explain" implies cause. Correlation doesn't. You're not a very good pub philosopher, are you?

By bioIgnoramus (not verified) on 12 May 2009 #permalink

What's wrong with caffeinated shit?

Don't make me fat, just makes me awake.

By Sandgroper (not verified) on 12 May 2009 #permalink