A Diebold Component? A Principle Components Analysis of Demographics

Update: Diebold Effect explained.

Here's a unique approach to understanding the Diebold effect: S.Walker has dealt with a potential multicolinearity problem between predictors by taking the principal components of a variety of demographic variables.

My brief rejoinder: the residuals of a logistic regression to predict the presence of Diebold machines based on Clinton Campaign presence, median age, % holding bachelor's degrees, percapita income, and population density are themselves correlated with the residuals of a regression to predict Clinton's votes based on the same predictors (R=.306, p<.001).

In other words, removing all of the variance due to demographic factors on Clinton's votes and the vraiance due to demographic factors on the presence of Diebold machines still leaves a "leftover" Diebold effect, connecting the two.

More like this

UPDATES: Diebold effect explained. (previous: 1, 2, 3, 4 5 6 (a nonlinear approach) 7) In contrast to exit pre-election polls, the final vote tally from the NH democratic primary shows a surprise victory for Hillary Clinton. People quickly noticed an anomaly in the voting tallies which seemed…
Update: Diebold effect explained. Jon Stewart famously accused the Crossfire co-hosts as "hurting America" by imitating the style and appearance of political debate to disguise partisan hackery and vacuous strawman arguments. In the case of the recent NH primary, the same criticism can be leveled…
It's been a couple of days since I posted on the New Hampshire recount. At the time, I fully expected that I wouldn't do another post on the topic, but a couple of things that have happened since then changed my mind. First, Scibling Chris Chatham included me in a list of people who he thinks…
As you know, I’ve been running a model to predict the outcomes of upcoming Democratic Primary contests. The model has change over time, as described below, but has always been pretty accurate. Here, I present the final, last, ultimate version of the model, covering the final contests coming up in…

I downloaded your data and have three quick thoughts.

1. Using Abadie and Imbens (2002) bias corrected matching estimator doesn't change things, so it's probably not a functional form problem. You can confirm this by looking at the data too (although taking logs of some variables is important).

2. The fact that the effect gets smaller when controls are added is very significant. If you use Altonji et al's (2002) idea of equal selection on observed and unobserved variables, the entire effect is gone. Here's the intuition: Adding all your controls increases the R2 by about .2, leaving around .7 left unexplained. If we have the same degree of selection for this remaining .7 then the 1 point drop we get from explaining .2 should really be about a 4 point drop if we could explain 100% of the variance. This leaves .01 for the Diebold effect. Now this isn't conclusive since you could argue that there is probably much more selection from the observed variables, which were chosen purposively, than the unobserved ones.

3. I haven't seen anybody account for spatial autocorrelation using a spatial error model. The idea that districts are independent is obviously wrong, as you can see with any map graph of voting shares. I suspect that if you correct for this the standard errors will get much bigger.

Just so everyone knows, T's email address reveals he is someone who definitely knows what he's talking about.

Is the assumption behind Altonji's idea that we have an equivalent number of important unobserved variables? Do we need to know whether each of the observed variables we're including here actually predict voting outcomes before examining the R2 drop?

Also, what if the source of the Diebold effect is a stochastic programming error in diebold machines (unintentional) - in which case there would always be some error in the diebold effect and the R2 could never get to 1? Or, more likely, if there is unintentional stochastic noise in the hand counting, leading to a "true" Rsquare for the diebold var that has an upper bound well below 1?

As for 3, I've tried including county membership as a variable in some models and it doesn't appear to change things, but I don't know how to turn that into a spatial autocorrelation model. If you have pointers I'll try running that.

Commenter #1 - "T" - has expressed interest in continuing the analysis; most recently he's observed a strengthening of the Diebold effect when using the preferred "MARKCLINTON" variable in the dataset, contrary to his comment above. We're corresponding through email, and the folks at BBV (Bill Bowen in particular) have been kind enough to provide lattitude and longitude for each NH precinct so "T" can do a proper spatial autocorrelation analysis.

Will post an update when I get one from "T"...

Chris, I've created an ordination of the demographic data over at my blog and overlayed where Diebold machines were used versus hand and then also shown where Hillary won versus Obama on the same ordination.

The graphs support what I've suggested. That the Diebold machines are in areas where the demographics predict Hillary winning.

Cheers,
Sean

Sean, You seem to be missing (or ignoring?) the point. If it's the former, I'm sorry for the following harsh tone.

OF COURSE, the demographic data is correlated both with the presence of Diebold machines and with the election outcome - we have known this from the beginning, and the only interesting question is whether variation in demographics explains more of the election result than vote method.

My, "T"'s, and other's "fancy statistical" analysis continues to suggest this is NOT the case, pretty graphs be damned.

The lack of "p-values and fancy statistics" in your graph makes it both meaningless and dangerous, because it plays into people's confirmation bias without a rigorous methodology.

No offense, but I think your post actually decreases the signal to noise ratio on the diebold effect: you strongly imply that everything can be explained by demographics while not commenting on previous work that comes to the opposite conclusion.

Most people will fail to understand our analyses, due to the sophistication of the stats, but will latch on to your simplistic graphs.

Hello,
I don't understand all the details of the sophisticated statistical tools here BUT :
- The more analysis come out, the more there's the suspicion of a Diebold effect..then denialists or apologists have an "exponentially " harder case
- Chris, can you correlate to voting patterns from prior NH elections to narrow down even more strongly the limits of demographic variability ? Gender, race, spatial variability ?
- Is there any historic example of turning a preelection 12% lead into a 3 % loss overnight ?
- What's the more probable given Iowa and up-to-date sociopolitical expertise ? 1) That the rural 39-36 for Obama would be replicated in the urban Diebold areas ? 2) That a statewide preelection 42-29, given the rural 39-36 would correlate to a 43-28 urban outcome ?
- Chris, can you somehow single out Hillsborough spatially ?
- Can you single out wards or precincts ?
- We have two fraud hypothesis : A) A flip of Hillsborough Diebold count in case of 1) B ) A flip of total Diebold count in case of 2)
- If you run the hypothesis of a 7000 vote flip in Hillsborough, do all stat anomalies disappear ?

By Elling Disen (not verified) on 19 Jan 2008 #permalink

More intuitive comments..

In calibrating spatial variability, would the Election Analysis help ? They conclude that Romney's numbers are adequately explained by precinct size/ population density.
Would the presumably unmanipulated Republican voting pattern determine the normal geographical gradients of politics ?

By Elling Disen (not verified) on 19 Jan 2008 #permalink

I'm going to write another post on this but I'd like to respond to the p-value remark.

Why wouldn't they be important? To me a p-value indicates whether the observed data are weird relative to the null model/hypothesis. I would not say that they are not important, but that they're only part of the a bigger picture that includes estimates of the effects sizes and graphical analysis of what the impact is.

If you have a large enough sample size you'll have enough precision to reject the null (significant p) even if the effect size is not that large and may not be important in the real world.

I don't want you to think I don't like or don't think p-values are important. That's not the case at all. I just think you have to look at the effect sizes as well as underlying assumptions of the efforts used to get the p-value.

I'll put my money where my mouth is regarding problems sorting out how demographics influence the vote. It will take me a bit to put it together.