Think Before You Plot

By drorzel on November 17, 2008.

There's a link in today's links dump to a post from Pictures of Numbers, a rarely-updated blog on the visual presentation of data (via Swans On Tea, I think). There's some really good stuff there about how to make graphs that are easy to read and interpret.

I would like to dissent mildly from one of their points, in the Better Axes post, specifically the advice about not starting at zero. In many cases, this is good advice, but like most rules of thumb, it shouldn't be followed too closely.

Take, for example, this post from one of my metastable xenon papers:

A strict application of the presentation rules promoted by people like Edward Tufte (which are generally quite similar to the things Pictures of Numbers says) would say that there's too much white space in this graph, particularly at the bottom. The variation in the data would be much more obvious if the vertical axis started at 0.5 instead of 0.

The problem is, that would be a deceptive presentation of the data. The whole point of this graph, in the context of the research, is that there's actually very little variation in the data. The points show the relative collision rates for various different isotopes of xenon, and they're all more or less the same. Blowing up the axis would make what differences there are more obvious, but that would distort the point of the graph.

This comes up a lot in intro labs, in which we frequently ask students to make null measurements-- to measure how the period of a pendulum depends on the mass, for example, or how the period of a mass oscillating on a spring depends on the amplitude. Students will carefully measure a half-dozen points, pop them into Excel, and make a graph that is auto-scaled so that the data points span the full vertical range of the plot. Then they'll fit a trend line to the data, and declare that the period increases linearly with the mass. And if you look at the plot, it looks for all the world as if the trend line slopes impressively from one corner to the other.

Of course, the period of a pendulum doesn't depend on the mass, so what they're fitting is just noise. The dramatic sloping trend lines are dramatic only because the scale is blown up so much. If the vertical scale went all the way to zero, it would be clear that there's no variation worth mentioning.

I'm not saying that you should totally abandon the principles Tufte and Pictures of Numbers suggest-- for most cases, their advice is good advice. It's important to remember that all these rules have exceptions, though, and to think carefully about what you're trying to show before you plot your data.

More like this

Hey Chad - Are you sure you're interpreting Tufte correctly? My impression from his discussion of "graphical integrity" is that he would consider a plot with a zoomed-in vertical scale (massively overemphasizing minute variations) to be an example of a significant "lie factor". That is, a true 1% variation in the data would show up as, e.g., a 50% variation in the vertical position of the curve in the graph.

I'm going off my recollection of The Visual Display of Quantitative Information (the book is in my screaming-infant-free office at work), which goes on at some length about the need to minimize white space by choosing the axis scale to maximize the spread of points, and so on.

The "lie factor" thing I've mostly seen applied to bar graphs and the like, not scatter plots.

One thing this emphasizes is how you can manipulate your plots to emphasize what you want. In choosing a scale, keep in mind what is considered large/small variability in *your* field. If you are showing stability of numerical methods, and "no variation" should be near machine precision, and your axis is like 10^-3, your graph is misleading. In this case, how big is the difference between a rate of 1 and 2? I think that should be more of a guide in how to scale, rather than the amount of white space.

Why not plot both? First and absolute scale that includes zero and all of the data, which instantly shows the variation in proportion to the central tendencies, and then a relative scale which details the variation?

Label the one 'full scale' and the other 'detail', and people will quickly get used to the scheme and start liking the pairing.

In this case, how big is the difference between a rate of 1 and 2?

Pretty big. What's plotted is the ratio of the collision rate for each sample to the collision rate for a reference sample. A factor of two difference between isotopes is possible, but would be highly noteworthy.

In a lot of cases, there are tight space restrictions-- Physical Review Letters requires all papers to fit within four journal pages, including the figures. You can't really justify putting in two figures if you can get the essentail information into one.

You see a lot of inset plots in PRL for this reason. People who want to do the full-scale/detail thing will put one figure inside the frame of the other.

I would include zero if I wanted to show that the points are away from zero, but in this case it sounds like an arbitrary value.

What I would do (or at least try) would be to make the points stand out more by making them larger: the lines are all very heavy. I would also play around with the vertical size: you could simply chop out the 0 to 0.5 part, making the whole graph smaller. It makes the journal's production staff happier, too!

Hmm, actually I think this plot would still show little variation if you kept the same sized figure but started at 0.5. The key is that you've got error bars, so they're filling the plot up: they're showing that the data is dominated by noise.

Could always plot variation from the theoretically predicted value?

(Gah. I've spent too much time with engineers, if plotting the residual is becoming my default)

Mike from Pictures of Numbers here.

I think starting that graph from zero is just fine. I don't think there's too much white space, and I don't think Tufte would think so either. You're quite right that trimming the white space would unbalance the data and magnify noise.

If I were to make any changes, I'd reconsider the assortment of symbols you're using in the key. Perhaps arrange the five isotopes in numerical order, and use a sequence of symbols that change in both size or shape? If they were less random, it would be easier to look for patterns.

Also I would probably gray out the dashed line a little, label it directly with the word "Theory", and get the key out of the way: it's encroaching on the data a bit.

Anyway, thanks for taking the time to discuss this.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

What An Eclipse Means For US President Donald Trump

More by this author

Go On Till You Come to the End; Then Stop

October 31, 2017

ScienceBlogs is coming to an end. I don't know that there was ever a really official announcement of this, but the bloggers got email a while back letting us know that the site will be closing down. I've been absolutely getting crushed between work and the book-in-progress and getting Charlie the…

Meet Charlie

October 30, 2017

It's been a couple of years since we lost the Queen of Niskayuna, and we've held off getting a dog until now because we were planning a big home renovation-- adding on to the mud room, creating a new bedroom on the second floor, and gutting and replacing the kitchen. This was quite the undertaking…

Physics Blogging Round-Up: August

September 1, 2017

Another month, another set of blog posts. This one includes the highest traffic I think I've ever seen for a post, including the one that started me on the path to a book deal: -- The ALPHA Experiment Records Another First In Measuring Antihydrogen: The good folks trapping antimatter at CERN have…

The Age Math Game

August 22, 2017

I keep falling down on my duty to provide cute-kid content, here; I also keep forgetting to post something about a nerdy bit of our morning routine. So, let's maximize the bird-to-stone ratio, and do them at the same time. The Pip can be a Morning Dude at times, but SteelyKid is never very happy to…

Kid Art Update

August 13, 2017

Our big home renovation has added a level of chaos to everything that's gotten in the way of my doing more regular cute-kid updates. And even more routine tasks, like photographing the giant pile of kid art that we had to move out of the dining room. Clearing stuff up for the next big stage of the…

More like this

Go On Till You Come to the End; Then Stop

Meet Charlie

Physics Blogging Round-Up: August

The Age Math Game

Kid Art Update

Where is everybody?

How Could We Affect The Earth?

Are parrots actually pigeons?