Take a look at this graph showing population distribution by county in a fictional U.S. state:

How do you read such a graph? Is this the ideal way to depict this sort of information? If you wanted to know which part of the state was most populous, how would you go about figuring it out? Researchers have developed conflicting models to explain how it’s done. One model suggests that people reading this kind of graph must cycle between the different parts in order to understand it. This makes some sense: to answer our question about population, you’d have to look back and forth between the legend and the colors on the map.
Another model says that how we read graphs like this depends on the question. We’d answer “what’s the population of Knox County?” differently than “How is the population distributed across the state?” The first question just asks graph readers to extract information from the graph, while the second question demands that readers integrate information from the entire graph, building an understanding of relationships between its parts.
Integration is clearly the more complex task, and the one that graph-makers are probably most interested in. But how do we identify the relevant parts of the graph, and what else are we doing to integrate the information on a graph? A team led by Raj Ratwani showed viewers several different graphs like the example above, asking them to talk their way through answering a variety of questions about each graph.
As you might expect, their reasoning was indeed different based on the type of question being asked. For questions like “What is the population of X county?” most people just gave an answer, perhaps searching the graph for a bit before responding. For integrative questions, very few respondents could answer right away, and instead spent time rethinking their responses, getting more information, and building a pattern before offering a response.
In a second experiment, the researchers used an eye-tracking device to see where respondents were looking while they answered questions about the graphs. They also simplified the graphs by removing the county names and replacing them with single letters, like this:

So where were they looking? Once again, it depended on the type of question being asked. As you can see from the figure above, counties can be grouped into clusters based on population. These clusters are defined largely by their outer boundaries. So the researchers counted how frequently viewers looked at these outer boundaries versus the interior boundaries of counties within the same cluster. This graph shows the results:

The chart shows how frequently the viewers fixed their eyes on the outer boundaries of a cluster. As questions became more complex, a significantly larger portion of fixations were on the outer boundaries. Progressively fewer fixations focused on interior boundaries between counties.
The researchers also analyzed the transitions between fixations — not just where viewers were looking, but the paths they took to get there. Here are those results:

As graphs became more complex, viewers spent more time moving their eyes from cluster to cluster, and less time looking from a cluster to the legend.
Ratwani’s team says this all suggests that viewers are using a more complicated process to read these graphs than previous research suggested. First they must integrate the graph visually — that is, determine which cluster goes with which data. Then, the cognitively integrate — figure out the relationship between the clusters.
The researchers offer a few suggestions on how to make better graphs based on their research:
Visual Integration
- Make the boundaries of clusters of data more obvious (for example, by making the lines between similar groups bolder)
- Use color schemes that make the differences between groups clear: Use lots of different colors, not just shades of gray
- Remove extraneous markings like the county names on the maps.
Cognitive integration
- Make the relationship between the legend and the items on the graph obvious (for example, by using consistent colors throughout a paper or presentation)
- Don’t use too many different colors (decrease the total number of clusters)
To this, I’d like to add a few of my own. First of all, the tradition in journals and books of labeling figures with numbers and placing them far away from their descriptions in the text should be scrapped. I’d rather see a partial page or even a blank page in a book if it meant that the figure actually appeared next to its description in the text. Online journals should place figures inline, not make you click to view them separately.
A related quibble: Books and journals have a tradition of *never* placing figures before their textual description, even if the textual description would appear on the same page (or two-page spread) as the figure. Once again, there’s no reason for this. It’s much better for a figure to appear a half-page before the textual reference than five pages after it.
Finally, figures should be clearly marked. Don’t use abbreviations or shortcuts in the legend. Say what you mean! Researchers often use abbreviations as shortcuts while they’re doing preliminary data analysis in the lab. This doesn’t mean you have to use those same shortcuts when reporting your data to the public.
Raj M. Ratwani, J. Gregory Trafton, Deborah A. Boehm-Davis (2008). Thinking graphically: Connecting vision and cognition during graph comprehension. Journal of Experimental Psychology: Applied, 14 (1), 36-49 DOI: 10.1037/1076-898X.14.1.36