Applied Statistics

Yesterday I posted this graph, a parallel-coordinates plot showing health care spending and life expectancy in a sample of countries:

6a00e00982269188330120a76420ea970b-500wi.jpg

I remarked that a scatterplot should be better. Commenter Freddy posted a link to the data, so, just for laffs, I spent a few minutes making a scatterplot containing all the same information. Here it is. (Clicking on any of the graphs gives a larger version.)


healthscatter.png

How do the two graphs compare? There are some ways in which the first graph is better, but I think these have to do with that graph being made by a professional graphic designer–at least, I assume he’s a professional; in any case, he’s better at this than I am! He also commented that he removed a few countries from the plot to make it less cluttered. Here’s what happens if I take them out too:

healthscatter2.png

(Unlike the National Geographic person, I kept in Turkey. It didn’t seem right to remove a point that was on the edge of the graph. I also kept in Norway, which was the highest-spending country on the graph, outside the U.S. And I took out Sweden and Finland–sorry, Jouni!–because they overlapped, too. Really, I prefer jittering rather than removing as a solution to overlap, but here I’ll go with what was already done in this example.)

What the scatterplot really made me realize was the arbitrariness of the scaling of the parallel coordinate plot. In particular, the posted graph gives a sense of convergence, that spending is all over the map but all countries have pretty much the same life expectancy–look at the way the lines converge to a narrow zone as you follow the lines from the left to the right of the plot.

Actually, though, once you remove the U.S., there’s a strong correlation between spending and life expectancy, and this is super-clear from the scatterplot.

The only other consideration is novelty. The scatterplot is great, but it looks like lots of other graphs we’ve all seen. This is a plus–familiar graphical forms are easier to read–but also a minus, in that it probably looks “boring” to many readers. The parallel-coordinate plot isn’t really the right choice for the goal of conveying information, but it’s new and exciting, and that’s maybe why one of the commenters at the National Geographic site hailed it as “a masterpiece of succinct communication.” Recall our occasional discussions here on winners of visualization contests. The goal is not just to display information, it’s also to grab the eye. Ultimately, I think the solution is to do both–in this case, to make a scatterplot in some pretty, eye-catching way.

P.S. I never know how much to trust these purchasing-power-adjusted numbers. Recall our discussion of Russia’s GDP.

P.P.S. The code I used to make the graphs is here.

Comments

  1. #1 Russell
    December 30, 2009

    What do the size of the circles represent?

  2. #2 anon
    December 30, 2009

    looks like number of doctor visits

  3. #3 qetzal
    December 30, 2009

    Interesting that a higher number of doctor visits per year appears to trend with lower health care spending.

  4. #4 Andrew Gelman
    December 30, 2009

    Qetzal: Doctor salaries in the U.S. are super-high. When we take our kids to see the doctor, we often barely even get time with the nurse.

  5. #5 Leo Martins
    December 30, 2009

    It worked great, thanks! My only change was to use transparency in the background color:

    symbols (expend, life, circles=doctor, inches=.6, add=TRUE, fg=color, bg=rgb(0,0,1,0.4))

    It only works with some devices like pdf and svg though. The analysis I’ll leave for next year…

  6. #6 Don Light
    August 23, 2010

    The underlying high correlation is not between health care spending and life expectancy but between income and life expectancy, isn’t it? Then health care spending is highly correlated with income. Do you agree?

The site is undergoing maintenance presently. Commenting has been disabled. Please check back later!