Great Moments in Deceptive Graphs

This morning, via Twitter, I ran across one of the most spectacular examples of deceptive data presentation that I've ever seen. The graph in question is reproduced in this blog post by Bryan Caplan, and comes from this econ paper about benefits of education. The plot looks like this:


This is one panel clipped out of a four-part graph, showing the percentage of survey respondents reporting that they are satisfied with their current job. The horizontal axis is the years of schooling for different categories of respondents.

So, I looked at that, and said "Wow, people with more education are significantly happier with their jobs." Then in the post, Caplan is talking about how small the effect is, and I said "What the hell?" then looked closely at the axis labels. Which actually span a tiny, tiny range of responses. A totally honest version of the black bars in the plot would look like this:


That is, there's very little difference between the four groups, with only a tiny shift up as you go to higher education. The fraction of people with post-graduate education who are satisfied with their jobs is only about 7 percentage points higher (9% of the total value) than the fraction of those who never completed high school who are satisfied with their jobs.

By carefully choosing their vertical axis to start just barely below their minimum value, though, the authors have managed to create the impression that the post-graduate cohort is about 25 times more satisfied than the non-high-school cohort (based on counting pixels in the vertical bars). Which is really impressive-- even the Excel auto settings do a better job, starting the vertical axis at around 0.74. It still exaggerates the effect, but isn't anywhere near as far into "How to Lie With Statistics" territory than the published graph, which is a marvel of axis-limit deception.

And economists wonder why they have a hard time getting physicists to take them seriously...

More like this

There's a link in today's links dump to a post from Pictures of Numbers, a rarely-updated blog on the visual presentation of data (via Swans On Tea, I think). There's some really good stuff there about how to make graphs that are easy to read and interpret. I would like to dissent mildly from one…
Nobody is ever going to mistake me for Edward Tufte, but whenever I run across a chart like this one: (from Matt Yglesias, who got it from Justin Fox where it was merely one of many equally horrible plots), I find myself distracted from the actual point of the graph by the awfulness of the…
Why Liberals and Atheists Are More Intelligent: The origin of values and preferences is an unresolved theoretical question in behavioral and social sciences. The Savanna-IQ Interaction Hypothesis, derived from the Savanna Principle and a theory of the evolution of general intelligence, suggests…
A rather radical proposal from Texas came across my desk recently, courtesy of the Texas Exes... A Modest Proposal for Texas Higher Ed: "... The UT System Board of Regents ... has hired consultants who have publicly stated the fundamental view that academic research is not valuable and that…

Also, some idea of standard deviation would be nice. I mean, if the effect really is highly significant (even though it's small) -- let's say the numbers are 0.800 ± 0.003 and 0.873 ± 0.002 -- then I can see cutting out some of the bottom bits of the graph, or (better) using graphics-fu to indicate a break in the graph from 0 to the interesting bits between 0.8 and 0.9. But if the effect is borderline, then I'd be far less inclined to cut out the bottom.

Which is pretty arbitrary, but if you want to convey 'these are different' versus 'these aren't that different', the two graphs show that quickly. Even if the first one distorts the magnitude of the effect -- which is troublesome to show a small, yet significant change and get both details visible quickly for the person who is just skimming your figures.

By Becca Stareyes (not verified) on 10 Jul 2011 #permalink

Darrell Huff's How to Lie with Statistics is an excellent primer on misleading data, including bad graphs. Inspired by Huff, I created a bad graph that turns a graph with periodicity into a straight line.

if your goal is to mislead on this issue, you don't need statistics. that's a sledgehammer where a slight tap will do it.

everything about the survey is misleading. it confounds correlation with causation. If people who self-selected for low educational attainment are happy in their job, that doesn't mean that everybody would be equally happy in those jobs.

Just another proof that figures don't lie but liars figure.

I think the assumption should be that the readers of an academic article will check out the labels on the axes. The second graph does not allow us to see the differences between groups and is actually less informative!

As to causation vs correlation it is discussed in the paper - they have twin and sibling studies and quasi natural experiments.


I thought the same thing. Bad enough that they fudged the y-axis scale, but they don't give any indication whether the differences are even statistically significant.

I clicked through to the full paper. The online appendix indicates that the data comes from a survey of 6811 people, but I couldn't find any stats on anything.

Besides, even if the differences are statistically significant - and they might well be - that doesn't mean that such small differences have any practical significance.

For these data, the right thing to do is probably a table, rather than any kind of graph. That would also allow you to put the uncertainties directly with the numbers, showing you the significance of the results as well. The differences are so small that any way you want to graph them that shows a clear difference will distort the picture.

I'd probably be OK with the Excel auto-scale, which started at 0.74. That shows the difference pretty clearly, but also makes it more clear that the difference is small. Even for a venue where people will look at the axes, though, this level of magnification of the effect is completely ridiculous.

And economists wonder why they have a hard time getting physicists to take them seriously...

Things are rough all over. So long as physicists continue to undervalue limousines, estates in the Hamptons, and $2K afternoon escorts, economists will continue to suffer in this way.

By Pierce R. Butler (not verified) on 10 Jul 2011 #permalink

You said, "...between the four groups..." but the correct phrasing is "...among the four groups..." since you are comparing more than two things.

Maybe I'm paranoid? but the first thing I look at is the
axes. There are a lot of graphs like this out there, I don't
necessarily think the aim is deception. The ranges provide
the context.. but yes you have to look at them. That's why
they're labelled.

A suppressed zero is used all the time in the social sciences, but it is particularly common in economics and business, not to mention politics. Newspaper articles are thick with misleading graphs.

It is also the case that default axis choices can be a problem in an intro lab when a signal isn't expected. Examples would be the amplitude dependence of the frequency of a mass on a spring or the x-coordinate of projectile motion. The data look a lot less bizarre if you make the scale more normal.

Comments @5 and @3 should look at the re-analysis in that blog where the full info from Likert scale was used rather than grouping a large range of answers (5, 6, and 7) into a single category. The conclusion changes.

By CCPhysicist (not verified) on 10 Jul 2011 #permalink

Must proof read. They grouped 1, 2, and 3 of the 1-7 Likert scale.

By CCPhysicist (not verified) on 10 Jul 2011 #permalink

I'd probably be OK with the Excel auto-scale, which started at 0.74

You want to be fooled, but more subtile?

Haha! Multilol!

You like to exaggerate, but not so much? Don't move the origin, if you don't want to fool yourself and your readers - it's simple as that.