Cognitive Daily gets a lot of complaints about graphs, mostly from readers who say the graphs are useless without error bars. My response is that error bars are confusing to most readers. But perhaps I’m wrong about that. Now I’m going to put my money where my mouth is.
Take a look at this graph. It represents a fictional experiment where two different groups of 50 people took a memory test. The mean scores of each group are shown, along with error bars showing standard error:
Based on this graph, can you tell if there is a significant difference (p<.05) between the scores of the two groups? Let's make this a poll (for the sake of accuracy, please respond as best you can even if you don't know what the error bars represent).
Below I’ve included a similar graph, again testing two different groups of 50 people but using a different type of error bar:
Again, based on this graph, can you tell if there is a significant difference (p<.05) between the scores of the two groups?
I’ll give the correct answers later today (after plenty of folks have had a chance to respond), but I’ll wager now that we will get a large number of incorrect responses to each poll, even though many of our readers are active researchers.
Here’s my wager. I say that fewer than 50 percent of our readers can accurately answer the poll questions without cheating or looking up the information elsewhere first. If we get more than 300 responses to each poll, and accuracy is better than 50 percent for each, then I’ll add error bars to every graph I produce for Cognitive Daily from here on out (as long as the researchers publish enough information for me to generate them — and as long as the error bars are statistically relevant [more on that later]). If not, then I get to link to this post every time a commenter complains about Cognitive Daily not putting error bars in its graphs.
Update: Okay, I think we’ve now gotten enough answers to demonstrate that most of our readers don’t understand error bars. I win! (I probably shouldn’t be too happy about that though…)
I’ll post the answers below — you’ll need to scroll down to see them so people can still answer the poll without seeing the answer first.
Now, the answers. For Graph 1, the correct response is “too close to call.” Since the error bars are standard errors, they need to be separated by at least half the length of the error bars. I’ll give partial credit for a “yes” answer, since they are actually separated by exactly half the length of the error bars. (For more explanation, see this post)
For Graph 2, the correct response is “yes.” Since the error bars are 95% confidence intervals, they can overlap by as much as 25% of the length of the error bars and still show a significant difference. These error bars just barely overlap, so the difference is definitely significant.
Any comments on the polls? I tried to make the instructions as clear as possible, but I’m open to hearing any claims as to how my test may have slanted the results.