UPDATES: Diebold effect explained. (previous: 1, 2, 3, 4 5 6 (a nonlinear approach) 7)
In contrast to exit pre-election polls, the final vote tally from the NH democratic primary shows a surprise victory for Hillary Clinton. People quickly noticed an anomaly in the voting tallies which seemed to show an advantage to Hillary conferred by the use of Diebold machines.
However, there was an easy explanation: towns with Diebold machines are more urban on average, and Hillary was always thought to have more support in urban areas. So, like many others, I was supremely irritated by the lack of analyses which statistically controlled for this obvious factor.
So I got a copy of the vote counts, and thanks to Brian London at BlackBoxVoting, the demographic information from each town (most notably, the % holding bachelor’s degrees, the median household income, and the total town population). Now, Mike LaBonte at BlackBoxVoting has provided estimates of the mileage for each district, allowing for the calculation of population density.
To my complete (and continuing) amazement, the “diebold effect” on Hillary’s votes remains after controlling for any and all of those demographic variables, with a p-value of <.001: that is, there are less than 1:1000 odds for this difference occurring through chance alone, and that's after adjusting for variability in Hillary's votes due to education, income, total population, and population density.
While this "diebold effect" varies in magnitude depending on the exact covariates used, it seems to center around an additional 5.2% of votes going for Clinton from Diebold machines. The same analysis shows a Diebold disadvantage for Obama of about -4.2%, significant with a p<.001, using the same covariates.
Due to the cooperative grassroots nature of this effort, I cannot guarantee the accuracy of the data file - the information has come from a variety of sources and I won't claim to have verified it all. Furthermore, I'm not a statistician - I'm waiting on the Social Science Statistics Blog (Harvard) and the Statistical Modeling Blog (Columbia) to weigh in. However, my analysis seems in line with this paper about the 2004 NH democratic primary.
NONETHELESS … the general conclusion is buttressed by the following analyses, all of which have come to similar conclusions:
- Elecion Archive’s analysis
- This one by an econ professor at Dartmouth.
- The european tribune reviews the case, with a variety of analyses
- An analysis using R
- BrFox’s analysis
As you can see, something appears to be highly amiss. There may be an unmeasured third variable (it’s probably not urban vs rural) or there may be something more nefarious.
Draw your own conclusions. Here are all the data files:
- The correct list of NH precincts using Diebold machines
- Mark Shauer’s List of Votes in NH precincts, Brian Fox’s data of the same, and Semmelweiss’s data of the same
- NH town square mileage, for calculating population density
- My “mega file” with all demographic information, squaremileage, and voting information (UPDATED: now also with county membership)
- NEW: Latitude and longitude for each NH precinct, for use in spatial autocorrelation models
Track the ongoing developments at BlackBoxVoting.
Also look out for updates from the Election Defense Alliance
UPDATE 1: Mike Dunford suggested controlling for geography, which I did in a repeated-measures ANOVA with all the covariates and the county membership of each precinct as between-subjects factors and Clinton & Obama’s votes as within-subject factors. The diebold effect remains significant at p<.001. I've updated the file with county membership.
UPDATE 2: Someone on reddit suggested controlling for which precincts were most highly-contested. I measured this as the absolute value of the difference between Obama & Clinton’s votes. The diebold effect remains with this and all other covariates at p<.001.
UPDATE 3: Mike Dunford’s new matching analysis (omitting statistics, on the assumption that demographics don’t “explain all of an election result). Mike doesn’t think anything is awry, based on the fact that votes simply seem discrepant above/below the 2000 vote cutoff value. However, including this as a categorical covariate in the model along with age, income, education, etc leaves the Diebold effects on both Hillary & Obama’s votes significant with p’s<.001.
Update 4I used this list of precincts with Clinton campaign offices to use as a covariate for “campaigning presence”, and the diebold effect is still significant at p<.001 controlling for that and all other covariates mentioned above. In fact, controlling for this variable improves the significance and magnitude of Obama's diebold disadvantage...
UPDATE 6
Neoplastic Icicle uses a nonlinear regression technique known as “random forests” and comes up with a miniscule Diebold effect (+.82% for Clinton). Keep in mind that Icicle’s argument that linearity assumptions are to blame may contrast with “T”‘s analysis that we’re not dealing with a functional form problem… Pending a response from icicle, however, it appears to the relatively untrained eye that there’s a large multicolinearity (multicononlinearity?) problem: icicle has included all other candidate’s vote percentages as predictors, from which Clinton’s votes can theoretically be predicted nearly perfectly. – this was elegantly addressed.