Now that we’ve apparently elected Nate Silver the President of Science, this is some predictable grumbling about whether he’s been overhyped. If you’ve somehow missed the whole thing, Jennifer Ouellette offers an excellent summary of the FiveThirtyEight saga, with lots of links, but the Inigo Montoya summing up is that Silver runs a blog predicting election results, which consistently showed Barack Obama with a high probability of winning. This didn’t sit well with the pundit class, who mocked Silver in ways that made them look like a pack of ignorant yokels when Silver’s projected electoral map almost perfectly matched the final results.
This led to a lot of funny Internet shenanigans, like people chanting “MATH! MATH! MATH!” on Twitter, and IsNateSilveraWitch.com. The Internet being the Internet, though, the general adulation couldn’t last forever, so there have been a bunch of people talking about how Silver isn’t really All That, and certainly not a bag of chips as well.
first out of the gate, surprising basically no one, was Daniel Engber at Slate, who argues basically that Silver didn’t do much with his complex statistical model that you couldn’t've done by just averaging polls.
This is true, as far as it goes– a large number of the races Silver called correctly weren’t all that competitive to begin with, so it’s not too impressive that he got those right. And if you only look at the close races, he didn’t really pick against the poll average, and the one race where he clearly did, he missed. Of course, that’s probably not too surprising. Polls are, after all, trying to measure the exact same thing that the election will eventually predict, and statisticians have been refining the process of public opinion polling for something like a hundred years. You would hope that they would do pretty well with it by now– it’s been a long time since there was a “Dewey Defeats Truman” type debacle. Any model, even a complex one, is going to necessarily line up with the results of good public opinion polls in most cases. In physics terms, Silver’s sort of doing the relativistic version of the Newtonian physics of public polling– the two will necessarily agree the vast majority of the time, and only differ when some of the conditions take extreme values. (For extra credit, calculate the velocity Montana must’ve been moving at for Silver’s model to miss that one race by the amount it did…)
Matt Springer at the reanimated Built On Facts says similar things, and also makes the important point that Silver’s full algorithm isn’t public, making it difficult to reproduce his findings. This is also an important point– if you’re not sharing the methods you used to get your results, allowing other people to replicate them, you’re veering closer to alchemy than science.
The most interesting attempt to critique Silver’s results is from the Physics Buzz blog, which tries to argue that Silver’s very success is a problem, because the chances of all the probabilities he quotes coming out exactly right are tiny. Even though Silver gave Obama a high probability of winning each of a bunch of swing states, the chance that he would win all of them, in a very simple approximation, ought to be the product of all the individual probabilities, which is a much smaller number. So, the fact that Silver “called” every one of those states correctly is somewhat surprising.
This sort of thing always reminds me of an experience I had in grad school, relating to the seemingly irrelevant graph at the top of this post (it’s the “featured image, so if you’re reading via RSS you’ll need to click through to see it). The graph shows the rate at which two metastable xenon atoms of a particular isotope colliding at a particular temperature will produce an ion as a result of the collision. This was the result of months of data collection and number crunching, and the first time I showed this at a group meeting, I was very proud of the result, and the fact that all of them are close to the theoretical prediction.
Then Bill Phillips, my advisor, went and burst my bubble by saying “I think you did the error bars wrong.” In the original figure, the error bars on the points were slightly bigger than in this one, and stretched over the theory line for every single point. As Bill pointed out, though, a proper uncertainty estimate would be expected to miss a few times– roughly one-third of the points ought to fall farther from the line than the error bars. So I had to go back and recalculate all the uncertainties, ending up with this graph, where at least a couple of points miss the line (two, rather than the 5-ish that would be the simplest estimate, but this is a small sample). I had, in fact, been too conservative in my uncertainty estimates (in technical terms, I had reported the standard deviation of a bunch of individual measurements that were grouped together to make these points, rather than the standard deviation of the mean).
This is a subtle but important point that even scientists sometimes struggle with. It’s too easy to think of “error” in a pejorative sense, and feel that “missing” a prediction here or there represents a failure of your model. But in fact, when you’re doing everything correctly, a measurement that is subject to some uncertainty has to miss the occasional point. If it’s not, you’re misrepresenting the uncertainty in your results.
Of course, that’s largely irrelevant to the Nate Silver story (though it provides a nice hook for a personal anecdote), because as people point out in the comments to that Physics Buzz post, this isn’t the only kind of uncertainty you can have. The post is correct that you would expect a low probability of correctly calling all the races if you consider them as completely independent measurements, but they’re not. The state-level results are part of a national picture, and many of the factors affecting the vote affect multiple states. silver himself was pretty up-front about this, noting repeatedly in his “swing state” reports that if Romney were to outperform his polls well enough to win, say, Wisconsin, he would most likely also win Ohio, and so on. The different state measurements are in some sense correlated, so treating them as independent events doesn’t make sense. This isn’t adequately captured by individual state probabilities, though. Which is another subtle statistical issue that even a lot of scientists struggle with.
So, anyway, what’s the takeaway from all this? Is Nate Silver a witch, or a charlatan, or what?
I think that to a large extent Silver’s personal fame is an accident– a happy one, for him. The real conflict here was a collision between willful denial of polling on the part of conservatives who really wanted Romney to be winning and media figures who really wanted the race to be close to drive their ratings and so on. He wasn’t the only one aggregating polls to get the results that he did, but being at the New York Times made him the most visible of the aggregators, which both made him the target for most of the pundit ire and the beneficiary of most of the glory when the results came in. What was really vindicated in the election was the concept of public opinion polling and poll aggregating; Silver just makes a convenient public face for that.
Does that mean his fame is undeserved? Not entirely. He was lucky to be in the right place, but he also writes clearly and engagingly about the complicated issues involved in evaluating polls and the like. He drew a big audience– something like 20% of visitors to the Times‘s political pages visited his blog– but that was largely earned by good writing. Which is not an easy thing to do.
The other question that a lot of people have been asking is what does all his complicated statistical folderol give you that just averaging polls wouldn’t? That is, isn’t his model needlessly baroque?
Yes and no. One of the things Silver’s doing, as I understand it, is trying to construct a model that will fill in where polling breaks down– that will let you predict the results of elections in states that aren’t “important” enough to get lots of public opinion surveys, by taking what polls you do have and factoring in the results of higher quality polls in demographically similar states, and that sort of thing. That’s what allowed him to even make predictions in a lot of the smaller state races– high-quality national polling firms don’t put a lot of resources into congressional races in Montana, so it’s a little harder to make predictions there.
This model wasn’t perfect, by any means, but like anything in science, it’s an interative process– having gotten the data from this election, Silver will presumably tweak the weights that he applies to various factors before trying to predict the next round of elections, and hopefully refine the model to make better predictions. In an ideal world, this sort of analysis would lead to reasonably accurate forecasts even where polling data are sparse, and that’s pretty cool if it works.
Another thing his data collection efforts can do is to help people assess the reliability of specific polls in a more systematic way. This includes things like the “house effects” that he talks about– Rasmussen polls tend to skew Republican by a couple of points, something he tries to take into account with his model. In a sense, you can invert his process, and use the corrections he applies to assess the polls themselves. Of course, since he doesn’t make all that stuff public (yet, anyway– who knows if he will eventually), this is a little tricky, but you can get some idea by comparing individual polls to his aggregate predictions.
(Of course, this is a moving target, as the pollsters are presumably doing the same sort of thing, in an effort to improve their models. I doubt Rasmussen is deliberately trying to be off by a couple of points in a Republican direction, for example, and it would be interesting to see if their bias shrinks over time as they tweak their models to get closer to the actual results.)
There’s also the question of the disconnect between the national polls and the state-level polls, a topic Silver wrote about a lot during the last few weeks. For a long time, the electoral picture looked a lot better for Obama than the national picture, in that “swing state” polls suggested he was doing better there than you would expect from polls of the nation as a whole. This was clearly a vexing question, and Silver basically bet that the state-level polling was a better indicator of the final outcome than the national polls. The success of his model may shed some light on the question of how and why that difference existed– it’ll be interesting to see what he posts about this in the next few weeks.
Finally, on a sort of meta-level, I think Silver was really useful for political culture as a whole. While the national poll numbers were always a little troubling, and the media did their best to whip up some drama, Silver’s aggregation provided a valuable corrective, in that for all the frantic horse-race chatter, the numbers barely moved. It’s a reminder that the vast majority of what you see on political blogs and cable chat shows is ultimately pretty unimportant. Thanks in part to the steadiness of the predictions of Silver and other poll aggregators, the liberal-ish crowd I follow on social media was a good deal calmer than they were back in 2008, which has benefits for everybody’s sanity.
So, in the end, like most such things, I suspect that both the initial hype and the subsequent backlash against Silver miss the mark a little. He’s lucky in addition to good, but really did provide some valuable services. And if you haven’t read his book (which he plugged on Twitter late on election night, in what might’ve been the greatest humblebrag ever), check it out, because he writes really well about a topic that in lesser hands would be crushingly dull.