Don't believe everything you see on a graph

This graph that Brendan Nyhan posted the other day got some attention from my coblogger John Sides and others.

i-81c068adf49614459fbea58f555a42bc-6a00d83451d25c69e20120a65c4709970b-500wi.png

For example, Kevin Drum describes the chart as "pretty cool" and writes, "I think I'm more interested in the placement of senators themselves. Democrats are almost all bunched into a single grouping, with only four outliers. Republicans, by contrast, are spread through considerably more space on both the economic and social dimensions."

Matthew Yglesias also labels the chart as "cool" and answers Drum by describing the pattern as "an illustration of the importance of setting the agenda. The Democratic leadership has only brought to a vote bills that unite the overwhelming majority of Democrats. . . ."

Yglesias may well be right on this point, but before going further I'd like to stand athwart history and yell Stop" for a moment.

My first reaction when seeing the above graph was, Huh? it doesn't look right to me. The graph seems to imply that Dems and Reps have a huge huge overlap on social issues, with the median positions of the two parties being virtually identical (and a Democratic senator in Vermont being quite a bit more socially conservative than Republican senators in Indiana, Tennessee, and two senators in Arizona). Can this really make sense?

I asked Brendan, who responded:

The graph is an auto-generated plot of the Lewis-Poole optimal classification scores for the 111th congress generated by Royce Carroll, one of Poole's students. So the important thing to keep in mind is that it's only being run on part of one Congress (rather than say, DW-NOMINATE, which Poole runs on all the Congresses as a batch), so the estimates may be screwy depending on the set of available votes. In this case, there apparently haven't been a lot of votes dividing the Senate Dems internally so their estimated ideal points are tightly clustered, whereas GOP divisions on the votes to date have caused their ideal point estimates to spread in two dimensions (this is not true for the House, where the Dems have had more internal division). Also, the second dimension that's being recovered for the last ten months in the Senate may or may not be the "social issues" dimension that Carroll labels it. The second dimension is always an interpretive mess in the post-civil rights period, and it's even worse for <.5 of="" one="" congress.="" .="" this="" is="" my="" best="" guess="" at="" what="" going="" on="" and="" i="" don="" know="" exactly="" carroll="" poole="" are="" doing="" behind="" the="" scenes.="">

OK, this makes sense. My take-home message here is that we should ignore the second dimension of the above graph, at least until someone can come up with some interpretation of it. The problem isn't simply an artifact of sparse data or agenda-setting; more fundamentally, we have to know the meaning of a variable before we start talking about it! A key step in any statistical analysis is to connect the inferences back to what is already known about the underlying system (in this case, the positions of senators on social issues).

Or, to put it another way, don't believe everything you see on a graph.

P.S. I don't mean this to be intended as some sort of devastating critique of Carroll's work. I've presented enough mistaken graphs on my website that I certainly can't blame others for posting things without making a sanity check first. Actually, posting stuff quickly on the web is a great way to get others to find your mistakes! And I hope that this post and others will be helpful to Carroll as he continues his research (and also helpful to me once I receive the inevitable corrections of whatever mistakes I'm making here).

More like this

Until a few days ago, very few people had seen these images. The labels were adopted only for consistency with informal conventions in earlier visualization work on Congress and were not designed to interpret this set of votes. Let me be clear: the second dimension in question does not in any way specifically isolate behavior on what we would think of as major "social issues." The variation in actual voting behavior on these issues is mainly captured by the first dimension. Moreover, it only reflects a small amount of variation and is derived only from the current session. So the meaning is subject to recent events only and the nature of that voting agenda. The purpose of these plots is only to allow others to view the votes as they occur in the context of the current session and interpret them how they would like. In line with this purpose, as soon as I became aware that there had been confusion about the labels, I substituted more generic labels. I hope this helps make these images more useful as long as they remain up.

By Royce Carroll (not verified) on 09 Nov 2009 #permalink

Maybe it's time to get back to Deming and Shewhart.

To understand variation, we must have a process for discerning between systemic/common causes of variation and special/assignable causes of variation.

Control charts do a better job of avoiding the two errors (thinking systemic/common when it is special/assignable or thinking special/assignable when it is systemic/common). Eyeballing the data is just notoriously bad at making the discernment.

By John D Kromkowski (not verified) on 11 Nov 2009 #permalink

where are the graphs with the new more generic labels?