There have been a bunch of interesting things written about education recently that I’ve been too busy teaching to comment on. I was pulling them together this morning to do a sort of themed links dump, when the plot at the right, from Kevin Drum’s post about school testing jumped out at me. This shows test scores for black students in various age groups over time, but more importantly, it demonstrates one of my pet peeves about Excel.
If you look at the horizontal axis of this plot, it shows regularly spaced intervals. If you actually read the labels, though, you’ll see that they’re anything but regular– the first few points show test scores at four- or five-year intervals, then there’s a cluster of two-year intervals, then some more long gaps.
These points are plotted at regular intervals because Kevin used Excel to make the graph (the ugly colors are another dead giveaway), and did what looks like the reasonable thing to do when you want to plot lines connecting your data points, namely choosing “Line Plot” from the plot options Excel offers. As I tell my intro physics students over and over, though, “Line Plot” is never the right choice, because of this exact problem. If you have both x and y values (say, test scores and the years in which those scores were reported), you want “Scatter Plot,” which will properly space the data on the horizontal axis. “Line Plot” always plots points as evenly spaced, because Excel is designed for middle managers who deal with sales data that occur only at regular intervals (and who secretly hate and fear real math).
What difference does it make? Here’s the data for 17-year-old students, plotted properly in the original report:
In the “Line Plot” version, it looks like something incredibly dramatic happened three points into the data series, with scores shooting way up overnight, then abruptly flattening out. Those first few points have twice the interval between them of the next few, though, so they ought to be twice as far apart horizontally. The original plot gets that right, and as a result, the big step up looks far less dramatic, and the flattening-out much less abrupt. There’s still a fairly clear step, but with the points properly spaced out, this looks a lot more like a small fluctuation above a general slow increase than a one-time dramatic jump in scores.
That doesn’t mean there isn’t something interesting going on in the early 1980′s– that’s still a significant increase in a short period– but when the plot’s done right, you get a more accurate picture of what’s going on. Which is to say, a muddled and slightly ambiguous picture that would really benefit from somebody with a time machine going back and getting some data from 1982 and 1986 so we have a clearer idea about what test scores really did in that period. Get on that, social scientists.
And the rest of you, remember: Excel was written for (and possibly by) monkeys, and no matter how reasonable it might seem, “Line Plot” is never the right choice.