If you’re not reading the Columbia University stats blog, Statistical Modeling, Causal Inference, and Social Science, you’re missing a lot of great stuff. For example, today’s post by Andrew Gelman discusses the paper “Forecasting House Seats from Generic Congressional Polls” by Bafumi, Erikson, and Wlezian. From the paper:

This paper is intended to provide some guidance for translating the results of generic congressional polls into the election outcome.1 Via computer simulation based on statistical analysis of historical data, we show how generic vote polls can be used to forecast the election outcome. We convert the results of generic vote polls into a projection of the actual national vote for Congress and ultimately into the partisan division of seats in the House of Representatives. Our model allows both a point forecast–our expectation of the seat division between Republicans and Democrats–and an estimate of the probability of partisan control.

Based on current generic ballot polls, we forecast an expected Democratic gain of 32 seats with Democratic control (a gain of 18 seats or more) a near certainty. (Emphasis in original.)

To arrive at these predictions, they used the results of the last 15 midterm elections, and generic polling data from within 30 days of those elections. From this data, they produced a regression equation that predicts the vote-share for Democrats based on generic polling. Here’s the equation:

Dem Vote Share = 24.38 + 0.51 * Dem Poll Share – 1.07 * Presidential Party1

They then stuck this year’s polling data (from October 8 until today) into the equation, and get a figure of 55% of the vote-share for Democrats (95% confidence interval = 51.3 to 58.7). Using simulations (desribed on p. 3 and in the appendix) that treat open and incumbent seats differently, they translate the vote share into the number of congressional seats and the probability of Democrats taking control of the House. The simulations yield the data summarized in this graph (p. 4):

As you can see, when you simulate the Democratic vote shares from within the confidence intervals (the graph goes from 50 to 60%, on the x-axis), the Democrats get at least 218 seats, and the probability that they’ll have control of the House of Representatives goes up to almost .9. In his post, Andrew Gelman argues that they might be overstating the confidence of this prediction. I’ll let the statisticians argue about that, but I still think the whole thing is really cool, especially since it predicts the outcome I want!

^{1}In case you’re not familiar with regression, before the values are put in, the equation looks like this:

V

_{j}= α +βGeneric Poll_{1}+_{jt}βPresidential Party_{2}_{j}

from this paper by the same authors. The equation is basically that for a line (y =mx + b, where m is the slope and b is the intercept), so α (24.38 in the final version) is the intercept, and the two β’s are the regression coefficients (0.51 and 1.07 in the final version), which take the place of m, or the slope, in the equation for a line.