# "The Diebold Effect": Hillary's Votes Higher From Diebold Machines Even Controlling for Demographics (education, income, population, etc)

UPDATES: Diebold effect explained. (previous: 1, 2, 3, 4 5 6 (a nonlinear approach) 7)

In contrast to exit pre-election polls, the final vote tally from the NH democratic primary shows a surprise victory for Hillary Clinton. People quickly noticed an anomaly in the voting tallies which seemed to show an advantage to Hillary conferred by the use of Diebold machines.

However, there was an easy explanation: towns with Diebold machines are more urban on average, and Hillary was always thought to have more support in urban areas. So, like many others, I was supremely irritated by the lack of analyses which statistically controlled for this obvious factor.

So I got a copy of the vote counts, and thanks to Brian London at BlackBoxVoting, the demographic information from each town (most notably, the % holding bachelor's degrees, the median household income, and the total town population). Now, Mike LaBonte at BlackBoxVoting has provided estimates of the mileage for each district, allowing for the calculation of population density.

To my complete (and continuing) amazement, the "diebold effect" on Hillary's votes remains after controlling for any and all of those demographic variables, with a p-value of <.001: that="" is="" there="" are="" less="" than="" odds="" for="" this="" difference="" occurring="" through="" chance="" alone="" and="" after="" adjusting="" variability="" in="" hillary="" votes="" due="" to="" education="" income="" total="" population="" density.="">

While this "diebold effect" varies in magnitude depending on the exact covariates used, it seems to center around an additional 5.2% of votes going for Clinton from Diebold machines. The same analysis shows a Diebold disadvantage for Obama of about -4.2%, significant with a p<.001 using="" the="" same="" covariates.="">

Due to the cooperative grassroots nature of this effort, I cannot guarantee the accuracy of the data file - the information has come from a variety of sources and I won't claim to have verified it all. Furthermore, I'm not a statistician - I'm waiting on the Social Science Statistics Blog (Harvard) and the Statistical Modeling Blog (Columbia) to weigh in. However, my analysis seems in line with this paper about the 2004 NH democratic primary.

NONETHELESS ... the general conclusion is buttressed by the following analyses, all of which have come to similar conclusions:

- Elecion Archive's analysis
- This one by an econ professor at Dartmouth.
- The european tribune reviews the case, with a variety of analyses
- An analysis using R
- BrFox's analysis

As you can see, something appears to be highly amiss. There may be an unmeasured third variable (it's probably not urban vs rural) or there may be something more nefarious.

Draw your own conclusions. Here are all the data files:
- The correct list of NH precincts using Diebold machines
- Mark Shauer's List of Votes in NH precincts, Brian Fox's data of the same, and Semmelweiss's data of the same
- NH town square mileage, for calculating population density
- My "mega file" with all demographic information, squaremileage, and voting information (UPDATED: now also with county membership)
- NEW: Latitude and longitude for each NH precinct, for use in spatial autocorrelation models

Track the ongoing developments at BlackBoxVoting.

Also look out for updates from the Election Defense Alliance

UPDATE 1: Mike Dunford suggested controlling for geography, which I did in a repeated-measures ANOVA with all the covariates and the county membership of each precinct as between-subjects factors and Clinton & Obama's votes as within-subject factors. The diebold effect remains significant at p<.001. i="" updated="" the="" file="" with="" county="" membership.="">

UPDATE 2: Someone on reddit suggested controlling for which precincts were most highly-contested. I measured this as the absolute value of the difference between Obama & Clinton's votes. The diebold effect remains with this and all other covariates at p<.001.>

UPDATE 3: Mike Dunford's new matching analysis (omitting statistics, on the assumption that demographics don't "explain all of an election result). Mike doesn't think anything is awry, based on the fact that votes simply seem discrepant above/below the 2000 vote cutoff value. However, including this as a categorical covariate in the model along with age, income, education, etc leaves the Diebold effects on both Hillary & Obama's votes significant with p's<.001.>

Update 4I used this list of precincts with Clinton campaign offices to use as a covariate for "campaigning presence", and the diebold effect is still significant at p<.001 controlling="" for="" that="" and="" all="" other="" covariates="" mentioned="" above.="" in="" fact="" this="" variable="" improves="" the="" significance="" magnitude="" of="" obama="" diebold="" disadvantage...="">

UPDATE 6
Neoplastic Icicle uses a nonlinear regression technique known as "random forests" and comes up with a miniscule Diebold effect (+.82% for Clinton). Keep in mind that Icicle's argument that linearity assumptions are to blame may contrast with "T"'s analysis that we're not dealing with a functional form problem... Pending a response from icicle, however, it appears to the relatively untrained eye that there's a large multicolinearity (multicononlinearity?) problem: icicle has included all other candidate's vote percentages as predictors, from which Clinton's votes can theoretically be predicted nearly perfectly. - this was elegantly addressed.

Tags

### More like this

In the week since the New Hampshire voting, a number of people have become increasingly concerned about some of the things that they've seen in the results. Two things, in particular, have gotten a lot of attention. The first is the difference between the pre-election polling, which had Obama…
##### Recount Redux
It's been a couple of days since I posted on the New Hampshire recount. At the time, I fully expected that I wouldn't do another post on the topic, but a couple of things that have happened since then changed my mind. First, Scibling Chris Chatham included me in a list of people who he thinks…
##### State of the Statistics: A Nonlinear Non-Diebold Effect?
UPDATE: Diebold effect explained? Marc has an excellent summary of a flurry of Diebold-related discussions between me, "T", Marc, and Sean. Sean also has a network model of the apparent Diebold effect. I think we'll soon hear from Brian Mingus (who's running a meta-classifier) and Steve Freeman (…
##### Diebold Vote Fixing. Real?
I've not said anything on the subject of election fixing over the last few years. I've seen lots of allegations of vote fixing in Ohio and other states, but never paid much attention to them. It would take extraordinary audacity for anyone to actually fix election results in any major way and I've…

Have you looked at the actual interaction process? The butterfly-ballot problem in Florida had nothing to do with SES, and everything to do with layout. Confused voters may be consistently making the same mistake.

In articles of the surprise in the the New Hampshire outcome, one attribution in pollster error was to less-educated Whites, assumed to lie to pollsters. A counter to that hypothesis, looking at exit polls, was that the groups that seemed to have changed their vote were the educated, and the likelihood that the vote changed from Obama to Clinton increased with educational attainment.

By James Igoe (not verified) on 16 Jan 2008 #permalink

Hello Chris,

Given that the electionarchive.org analysis suggests a swap of Clinton and Obama Diebold votes, have you considered running your regression with such a swap? I'm curious about what happens to the voting method correlation, and if other correlations such as college education become stronger.

@Ringo: I assume you mean whether the order of votes was counterbalanced. I will try to include that in my next analysis.

@James: A keen observation, given that (to many people's surprise) % holding Bachelor's degrees is indeed a powerful predictor of Clinton's votes. However, the diebold effect pervades even that one.

@John: My earliest analysis used Obama's votes as a covariate, and examined the interaction with method, under the assumption that a thrown election would swap Obama's votes for Hillary's only among precincts with Diebold machines. This assumption may be unnecessarily loaded so I've dropped it from the analysis. But including Obama's votes as a predictor do not substantially change the results, if that's what you mean. I've also tried controlling for the level of contest - i.e., the absolute difference in percentage points between obama & clinton's votes - thinking that there may be something different going on in hotly contested elections. No luck - diebold effect is still significant.

I wonder what the paper ballots looked like. Where on the page was Obama's name versus Hillary's name? Is this consistent across all paper ballot districts? And is that placement the same as the placement on the Diebold screen? I know this is probably a stretch, but I have heard that the order of candidates' names can have a modest effect on election outcomes.

Wow! I believe the hand recount requested by Kucinich starts today. Shouldn't that sort out this mess?

By Don George (not verified) on 16 Jan 2008 #permalink

Tel, you are exactly right and this is priority #1 for me - I just don't know how to get this information. No one on the BBV forum thread seems to know.

@Don: yes, unless you're inclined towards conspiracy theories; here is a section of Bev Harris's (of BBV) most recent update about how the ballots are being handled in NH for the recount (the "chain of custody"):

"VERDICT: New Hampshire is unable to document its chain of custody properly, lacks written procedures, its secretary of state has said he doesn't know where its memory cards are, and LHS has been encroaching on state elections with near-total control. I'll be preparing a Special Report when I return from New Hampshire with documents and video to support this assessment. "

I read that there were 18 names on the ballot, with Hillary's near the top and Obama's near the bottom. Perhaps the voting machines only display a few at a time? Perhaps a voter wanting to vote for Obama needed to scroll through a long list? That would be an explanation for this phenomenon.

By Tractarian (not verified) on 16 Jan 2008 #permalink

I don't understand why it is assumed Clinton would do better in the urban areas of NH when it was Obama, not Clinton, that did better in urban areas of Iowa, see data at:

Hit the "Margin of Victory" tab, Obama won by a large margin in all of the areas around cities and towns, link:

http://politics.nytimes.com/election-guide/2008/results/states/IA.html

I am sure you have better data sources, but I think the simple visual graphic makes the point clear.

By eyesonthestreet (not verified) on 16 Jan 2008 #permalink

Have you controlled for outright bribery? That might affect the results without really implicating the Diebold machines. I doubt such a technique would be beyond the Clinton campaign.

Tel -- FYI, all ballots are paper ballots in NH; there is no Diebold "screen." The Diebold machines are optical scanners. I'm glad the hand-recount is going forward; I hope it clears up what went on here.

Mike, thank you. If it's true that ordering was the same between hand counted ballots & those fed into Diebold's "accuvote" optical scan, this cannot explain the discrepancy and need not be entered in the analysis.

Hi Tom - As I understand it, the problem with the publically available and most recent exit poll data is that it is "corrected" as the primary goes on to match the existing data. This is because the purpose of exit polling is to show how the demographics correlate with voting preferences. Read the BBVforums thread for more information.

So for our purposes, the existing (updated) exit polling data is meaningless. The raw and uncorrected data, which I believe was available earlier in the race, is more germane, but no one has released that to the public in a precinct-by-precinct format. I'm trying to get a PGP key or some other method of email encryption in case someone with access to it wants to send it along... but I can't figure out what's the best encryption method. Any recommendations?

Again, I would refer you to the comments Daniel Merkle makes in the article I linked to, which refers to uncalibrated data. I can only tell you that the race was close, which is why it took some time to "call."

If you left a hundred trillion dollars laying on the street unguarded, do you think someone might try to steal some of it?

But rig an election by counting votes with machines that have proprietary software? Oh no, why would anyone want to do that?

Tom, I will strikeout that text if you think it detracts from the legitimacy of this story, but the fact is: exit polls show something different than the final vote tallies: final tallies show a Hillary victory, whereas exit polls do not (accd to you they show a "close race" and according to various others they show something else). The discrepancy remaains, although to you I assume it's uninteresting due to margin of error etc.

I agree with Tom Webster, above. Perhaps you mean to say, "In contrast to pre-election polls..."?

On this point:
"As you can see, something appears to be highly amiss. There may be an unmeasured third variable (it's probably not urban vs rural) or there may be something more nefarious."

Couldn't there be another option in between a missed variable and "something more nefarious," which I take to imply fraud?

Namely: Mechanical error in a number of the optical scanners?

@Along & @Tom: Fine. Striken and revised.

now that I see your response to Tom I see the exit poll discrepancy you were making reference to.

Let's get rid of voting altogether. Polling is obviously the most accurate method of measuring voter intent, and is never ever wrong. All hail the exit poll!

I've been making this point since the primary results came in: even if the Diebold results are verified in the hand-recount, New Hampshire's lax same-day registration requirements (see http://www.democracyfornewhampshire.com/node/view/3016) make it possible for out-of-staters to drive into New Hampshire, vote, and drive back to their home state. Yes, identification is requested at the polls, but it is not required to submit a vote. Votes submitted without an ID are followed up on weeks after the polls close (if ever), but because the results are reported the same day to CNN, ABC, and all the rest, the momentum of the campaigns has already been affected and the damage to the election has been done.

It is entirely possible that the New Hampshire primary was decided by over-zealous out-of-staters who decided to give their chosen candidate a little bump. If you examine the disctricts that defied the poll expectations, they were all in the southern part of the state, right along I-93. Easy in, easy out.

The legality of these votes that the electronic scanning machines are counting (or not counting) need to be verified before we simply recount all the possibly illegal ballots cast.

Look for the third variable in the exit poll methodology. What were the ages of the exit pollsters doing the interviewing. Were they primarily from Obama's key demographic? Were they distributed randomly or based on population density?

@tel and CHCH: the ABC News blog wrote about Obama's placement on the ballot. http://abcnews.go.com/PollingUnit/Decision2008/story?id=4107883&page=1

here are the money graf's:

"Without a doubt, a big source of the discrepancy between the pre-election surveys and the election outcome in New Hampshire is the order of candidates' names on the ballot and in the surveys.

"Our analysis of all recent primaries in New Hampshire showed that there was always a big primacy effect big-name, big-vote-getting candidates got 3 percent or more votes more when listed first on the ballot than when listed last."

"This year, the secretary of state changed the procedure so the names were alphabetical starting with a randomly selected letter, in all precincts."

"The randomly selected letter this year was Z."

"As a result, Joe Biden was first on every ballot, Hillary Clinton was near the top of the list (and the first serious contender listed) and Barack Obama was close to last of the 21 candidates listed."

@OnShakeDown: the effect of ballot placement is independent of the vote counting method used, as the ballots are the same whether they are counted by hand or machine-scanned.

My problem with the Diebold-diddit theory was that I don't think Hillary has that kind of pull with Diebold. A friend of mine cleared up my confusion: Diebold is just getting a little more subtle in tampering with the general election. Hillary is innocent of tampering, she is the unwitting, temporary beneficiary of Diebold's decision to throw the primary to the Democratic candidate most likely to lose. (well, ok, that would be Kucinich, but nobody would ever believe it if he won New Hampshire).

By trilobite (not verified) on 16 Jan 2008 #permalink

This is a conspiracy theory, this is denialism. Havent you read the Hoofnagles "denialists deck of cards?" Thats what Im seeing here. Its a slippery slope, these wily accusations lead to 9/11 conspiracy buffonery etc. Stop now. Im sure the Hoofnagle brothers will not be happy about this blog post and will be posting a scathing rebuttal to this nonsense.

Why is it that every scandel or problem with counting votes always involves the Clintons or Democrats. I wonder? Yeah right!!!!

I would not put this past the clintonistas to cheat. That is what they are all about. Oh wait, could this be a "fairytale".

Everytime I see Hillary or Bill I feel that I must take a shower. They are horrible people.

Good work on the statistical analyses! Now can you tell us what are the odds that the percentages from the hand-counts and machine-counts are EXACTLY reversed? That must be very highly improbable! As someone who has done programming before, it looks to me like a programmer at LHS must have inadvertently (supposedly) mixed-up the variables (i.e. the candidate identifiers). In other words, machine votes for Clinton were counted as Obama votes and machine votes for Obama were counted as Clinton votes. Is there any way to verify that this is what happened?

By Ben Scrude (not verified) on 16 Jan 2008 #permalink

So where does this stand right now? Are there possibly other statistical factors for which one might be able to control, or are we simply waiting on the recount?

As a voter in this election in a city with diebold equipment I can say that something could easily go wrong. When I was putting my paper ballot into the machine it "rejected" it and spit it back out. I had to put it back in a second time and then it went in. Now I was a registered Republican so my vote wasn't part of the democrat discrepancy but if my ballot had problems how many more could have been "accidently rejected" and really double counted or not counted at all?

Chris - thanks for your work. I am wondering, is it possible that there was some problems in certain scanners that were "flipped" in terms of how they were programmed that might have caused the mirror effect, or some version of it, between Obama and Clinton.

Take a look at the ballot

http://www.sos.nh.gov/Dem%20ballot.pdf

and you will see that Clinton is 4rth from the top and Obama is 4rth from the bottom. Is it possible that these scanners - some of them - may have read these the wrong way if the ballots were fed in upside down or some such thing?

Might some of the Obama checked ballots read as Clinton, and therefore affect just Clinton and Obama?

Anyway, just a random thought.

How have you excluded the possibility that people lied in the exit interviews?

By Caledonian (not verified) on 16 Jan 2008 #permalink

So are you accusing Clinton of rigging the machines or Obama of bribing the vote counters?

Beyond the snark: what about controlling for liberal-ness of precinct? The theory being that more liberal = likely to resist Dieblod = more likely to vote for Obama.

@Dave: I would like to include records from how each precinct voted in 2004, but have not yet contacted the authors of the 2007 electoral research paper I linked to. It's next on my list.

@Caledonian, these analyses do not take into account exit polling data, but if I get that I certainly will do something to try to correct for the supposed Bradley effect

@Vik and Ben: the "flipping" idea has been addressed by others and I believe is no longer accurate, but I'll try including obama's votes as a predictor of clinton's. should be highly significant if a flip is to blame.

@Kazanir: the next covariates should be 2004 district voting records, race, gender, and hopefully exit polling data.

The problem with the Democratic "recount" is that they are not demanding a chain of custody accounting of the memory cards AND the original paper ballots. BTW, this isn't a "recount", this is the first time these ballots would have been counted by anything other than a Diebold machine. Other notable issue is that Diebold has admitted a 1% accepted machine error when an automatic "recount" only takes place with a .5% discrepancy!

Add all the anomalies above and this election- and maybe all our elections are fraudulent...

By Michael g (not verified) on 16 Jan 2008 #permalink

"Also note that Romney got a boost.

National polls show he would be the easiest for Dems to beat"

So having allegedly rigged both the 2000 and 2004 elections for Bush, Diebold's owners (who are big-time Republican donors) are now risking prison and bankrupcy to get Hillary elected?

If you take any large data set and look hard enough and long enough you will statistically unlikely patterns in it.

By Ian Gould (not verified) on 16 Jan 2008 #permalink

What about the possibility that this is merely noise in the system due to post-hoc analysis of the data? Isn't it fairly easy to find data anomalies after the fact, but much harder to predict them before hand? 1 in 1000 events, after all, happen all the time, don't they? The odds of dying in a car crash are less than 1 in 1000, but people die in car crashes every day.

@Tim & Ian: In this case, the problem of multiple comparisons is minimized, since the Diebold effect was a hypothesis of interest before any statistics were run. In fact, predicting Clinton's votes from Diebold machines were the first model I calculated, and the fact that no covariates have been discovered which reduce this effect suggests it is NOT merely noise.

Besides, people deal with the issue of multiple comparisons all the time by correcting their p-values. In this case we can increase the p-value by around 100 times and still have a result that is significant by statistical standards.

Have you verified the demographic data you got from BlackBoxVoting? As I understand it, the "mysteriously switched percentages" thing came from erroneous data.

@Harold: I am using the most updated vote tallies and demographic information.

Another variable may be what I call the "Herd Effect".

A person is my likely to think highly of (vote for) someone they hear positive things about from their friends and loved ones. Since Hillary took the Urban areas that would mean that she would get a bigger bump in votes from the herd effect then Obama did when taking the less populated areas.

Has anyone looked if the actual buttons (or identification numbers) for Hillary & Obama matches with some winning or losing candidates of previous elections?

By Henk Poley (not verified) on 17 Jan 2008 #permalink

Let's get rid of voting altogether. Polling is obviously the most accurate method of measuring voter intent, and is never ever wrong. All hail the exit poll!

Why, then, is it that exit polls are never off by more than 0.1 % in First World countries, with the only exception of the last few elections in the USA? Why?

Also, don't you remember that the Serbian, the Ukrainian, and the Georgian Revolution were triggered by discrepancies between the official results and the exit polls?

For me as a European, it is obvious that, in spite of comment 28, comments 17, 31 and 49 have a point.

New Hampshire's lax same-day registration requirements [...] make it possible for out-of-staters to drive into New Hampshire, vote, and drive back to their home state. Yes, identification is requested at the polls, but it is not required to submit a vote.

What the fuck.

This is a conspiracy theory, this is denialism. [...] Im sure the Hoofnagle brothers will not be happy about this blog post and will be posting a scathing rebuttal to this nonsense.

Why is it that every scandel or problem with counting votes always involves the Clintons or Democrats. I wonder? Yeah right!!!!

Memory hole much? This is the first since Nixon vs Kennedy.

Everytime I see Hillary or Bill I feel that I must take a shower. They are horrible people.

Why do you talk like this about the best Republican president you've ever had...? :-}

How have you excluded the possibility that people lied in the exit interviews?

How many people would be so ashamed of wanting to vote for Clinton that they'd lie on an anonymous poll?

Other notable issue is that Diebold has admitted a 1% accepted machine error when an automatic "recount" only takes place with a .5% discrepancy!

May I repeat myself: what the fuck.

Why is democracy treated with such incredible neglect in, of all places, the USA??? I mean, this isn't the More or Less Democratic Republic of Congo, is it?

-----------

Let me put it cynically: if Clinton's campaign cheats, that means that if she wins the nomination her fight against the Reptilian nominee will be fair.

By David Marjanovi? (not verified) on 17 Jan 2008 #permalink

A cynical, non-conspirationist explanation:

Diebold machines are complicated and confusing, and sometimes actually entering your vote is not a trivial matter (see Shawn's comment above).

Therefore, urban, educated, tech-savvy types (supposedly more likely to vote for Clinton) had more success in entering their vote than people with lower "tech-smarts" (supposedly more attracted to Obama's charismatic, instinct-driven character).

@toto: As far as I know, the ballots counted by hand were identical to those scanned optically by the Diebold Accuvote machines. So in this way, the NH primary is the best opportunity we have to detect machine-based discrepancies in the tally that are uncontaminated by human-computer interaction confounds.

Here's an important question no one seems to be asking: how old, exactly, are the Diebold machines that New Hampshire used? I understand that there is a certification process, which takes about a year.

By Maureen Lycaon (not verified) on 17 Jan 2008 #permalink

Memory hole much?

I meant the first one involving Democrats. (And even the Nixon vs Kennedy one didn't matter in terms of results, because Kennedy would have won anyway.) That's what I tried to allude to when I talked about the exit polls.

By David Marjanovi? (not verified) on 17 Jan 2008 #permalink

The Owners WANT Hillary to win. No, not the nomination, the whole enchilada. It's the Dems turn to win. Someone has to clean up the mess AND be blamed for the crash of the dollar. If you still think there is any difference in the two parties, ask yourself why nothing has been done in the last year, despite the will of the country to end the war and bring back our Bill of Rights. After at least two stolen elections, why do we still have these Diebold machines? The same "anomalies" occured on the Rep ticket. Diebold machines run by a cocaine dealer cannot be trusted nor verified. The American people should not have to go through this at every single election. We must get rid of every single unverifiable voting method. Until then every winning candidate will be suspect. WAKE UP!

9/11 WAS an inside job and anyone who believes otherwise is just not using their brain. The government is the only entity capable of pulling that off. IMPEACH CHENEY FIRST!

"How many people would be so ashamed of wanting to vote for Clinton that they'd lie on an anonymous poll?"

Ask rather how many people would be ashamed of not wanting to vote for Obama?

By Ian Gould (not verified) on 17 Jan 2008 #permalink

My point exactly, election fraud crankery leads to 9/11 crankery. The Hoofnagles would never tolerate this nonsense, one must wonder why the blogowner peddles such conspiracy theories, that lead to even more absurd conspiracy theories.

Fell: I don't censor comments, so anything anyone wants to say gets posted.

"election fraud crankery"? I am merely pointing out a statistical anomaly. You don't want to be lumped in with the denialists, do you? ;)

The lunacy surrounding election fraud conspiracy theories is very similar to 9/11 crankery. Like a good lawyer, you only point out "anomolies", a 9/11 crank could do the same thing, Just pointing out "anomolies" like the collapses of the buildings, and deny what they are obviously implying.

By pointing out "anomolies" which are probably not anomolies at all, you ARE implying that election fraud might have taken place. This is denialism and psuedoscience and is just as absurd as saying 9/11 was an inside job. How many people would be involved in this conspiracy to rig the machines, and no one would blow the whistle? It's all paranoid nonsense.

Fell, you reveal your failure to understand the nature of the statistics by saying these anomalies are "probably not anomalies at all." There are less than 1 in 1000 odds that this occurred through chance alone; there is some other causal factor.

I am *not* saying that causal factor is election fraud - reread the post if you like - I believe it is probably a demographic issue. But your thoughtless knee-jerk reactions do nothing to help prove that.

If you are skeptical of the election fraud explanation, as I am, you could be more productive by trying to supply me with the appropriate covariate (race, gender, etc) or making guesses as to what it might be.

I suggest you are the denialist here, because you willfully ignore the statistics which show something is definitely anomalous. You are denying a half-century of statistics that established the basis of significance testing simply by saying "oh, it's probably nothing because one potential explanation is unlikely." (Or unpalatable?)

In short, you are a troll, and I'm not going to feed you anymore...

Do you think there may be a problem with multicollinearity (e.g. your predictors are correlated with each other; http://en.wikipedia.org/wiki/Multicollinear) with the data? I just downloaded your giant spreadsheet and haven't had a lot of time to play around with the data.

In addition, there may be demographic predictors of the vote that are correlated with an Obama or Clinton vote. If those factors are correlated with the presence of the "machines" that would seem to suggest that what you're seeing is actually the result of some other pattern that coincidentally matches up the presence of machines. Maybe the pirates and global warming are a good example of this.

Oh one more thing.

P values are the probability that, given the data, the null hypothesis is true not that things are due to random chance.

Fell:
In 2000, there was a poorly designed "butterfly ballot" that was used in a certain county in Florida. The design was confusing, which led many people who probably intended to vote for Al Gore to inadvertantly vote for Pat Buchanan. This was not intentional fraud or a conspiracy, but it does seem to have introduced an artificial bias.

I have no idea whether these Diebold could be having some unintentional bias, but the bias doesn't have to be an intentional one.

After following the link in #12, which says:

"From the WaPost:

The New Hampshire ballot rules may also have played a role. In previous contests, the state rotated candidate names from precinct to precinct, but this year the names were consistently in alphabetical order, with Clinton near the top and Obama lower down. Stanford professor Jon A. Krosnick, a survey specialist, has estimated the impact of appearing high on the New Hampshire ballot at three percentage points or greater.

I have no reason--other than patriotism(!)--to think this isn't true! Or to put it another way, if this is true, what does it say about at least three percent of our country's voters? "

I'm leaning toward unintential bias due to voter carelessness (or cluelessness) and the fact that Clinton's name was alphabetically advantage, while Obama's name is alphabetically challanged. I think the order should be rotated for each voter to eliminate the possibility of such bias. (Just a hypothesis).

I reanalyzed things over at my blog. The demographic data also strongly predict where the diebold machines were used. Based on this I'm not sure if there's going to be a good way of figuring out if there was any funny business.

Sorry for the flurry of comments, it was this or go work in course syllabi. This seemed much more interesting.

@S.Walker: Very impressive! I agree there is a colinearity problem, but I have dealt with that by predicting Clinton's votes from all demographic variables and saving the residuals. The residuals are still predicted by vote method.

I am not familiar enough with PCA to say whether or not what you've done is meaningful, but it is very interesting. Do you think my "residualized regression" procedure would not deal with the colinearity problem?

Simple questions.

Does anyone here believe that there is any chance of voter fraud in that there was a conspiracy to get Hillary elected?

Does anyone here realize that a good majority of people who beleive this also believe 9/11 was an inside job (Amy above.) You got some major crank company, which is predictable since the same weak arguments are used. On that note who here actually beleives any of the 9/11 drivel Amy mentioned above?

If you answered "yes" to any of these questions, you are a bonafide denialist, and this type of crankery and rhetorical deception is clearly outlined on Dr. Hoofnagles blog in their opus "A Denialist's Deck of Cards."

I suggest you people read it, and realize peddling these conspiracies are nothing but pure propaganda tactics. Dr. Hoofnagle doesn't tolerate this nonsense, and neither should you people.

A small correction:

"P values are the probability that, given the data, the null hypothesis is true not that things are due to random chance."

P values are the probability of observing data that is as extreme as the observed data given the null hypothesis. That's a different thing.

By Mark Frank (not verified) on 17 Jan 2008 #permalink

@S.Walker
P values are the probability that, given the data, the null hypothesis is true not that things are due to random chance.

Not quite (and one of the problems with p-values in general): it's the probability of obtaining the data/results/test statistic, IF the null-hypothesis is true (i.e. p(data|H0), not p(H0|data) you're describing, unless you're not talking about the standard p-value of 'Fisherian' Null-hypothesis testing.)

Anyways, this is fascinating stuff. I'll have to play around with these data and my newly-aquired knowledge of logistic regression. I like your PCA approach too, S.Walker.

And don't forget kids: correlation is not causation! Voting is not a controlled experiment with random sampling and all the other important aspects.

Ian Gould, a respectable comment writer I've read at Deltoid, wrote: "Ask rather how many people would be ashamed of not wanting to vote for Obama?

That explanation has been convincingly rejected. According to the pollsters themselves, the only surprise in the polls was that Hillary got more votes than expected. Obama got about what they had predicted. So did Edwards, and all the republicans, of course. Exit polling goes a long way to explain this, there did indeed turn up more people of Hillary's key demographics. The question is why.

I don't think the recount will change who won. That's not the important bit anyway. What matters is the integrity of the voting machines, and whether there are any systematic errors (even though, as I said, these errors probably aren't decisive).

But back on topic: Race-gap polling as an explanation doesn't work, and it creates far more questions than it answers (why didn't it happen in Iowa, for instance?).

Ian Gould, if you're the same Ian Gould who posts at deltoid, I know you can do far better than just coming up with dismissive explanations without looking at the evidence. It's an important issue. If Chatham is wrong, it's important that you show him how. If he's right, we all owe it to ourselves to defend him. Standing up for it, even though it invites praise for 9/11, ridicule from "sensible" bloggers, and other unpleasant experiences, is mighty great of Chatham if his calculations are correct.

Ask rather how many people would be ashamed of not wanting to vote for Obama?

Oh. Good point.

My point exactly, election fraud crankery leads to 9/11 crankery.

What do you mean, "leads to"? Just because Amy is ignorant ("the government is the only entity that could pull this off" my ass) doesn't mean she's thinking logically, let alone that her premises are correct. Yes, Cheney should be impeached first, but for different reasons (Iraq for a start).

How many people would be involved in this conspiracy to rig the machines, and no one would blow the whistle?

Not the machines. The proprietary software that the machines use. Or the proprietary software that is used to tabulate the results -- for reasons of sheer stupidity, in American elections the results are sent to central places and counted there, instead of being counted on the spot and then being sent to a central place along with the ballots, so, for example, Ohio's votes were counted on Kenneth Blackwell's desktop.

Maybe the pirates and global warming are a good example of this.

Incidentally, the number of pirates is on the rise again, while global warming continues unimpressed...

P values are the probability that, given the data, the null hypothesis is true not that things are due to random chance.

And the null hypothesis here is that the supposed anomaly is a fluke -- random, in other words. No?

By David Marjanović (not verified) on 18 Jan 2008 #permalink

Sort of. By doing that what you've done is parse out all the variation due to demographics and then analyze the impact of the Diebold machines separately.

However what it doesn't deal with is that you can predict the use of the Diebold machines based on the demographics and predict the vote outcome based on the demographics. This is problematic because the demographic variables are really different in places where the machines were used compared to where they weren't. So, for this data the approach that both you and I have taken is to use a multiple regression model (which if we made having Diebold machines categorical would be similar to a Analysis of Covariance). Now when you do an experiment and you know you're going to have to deal with multiple predictors of the outcome in addition to your experimental manipulation you hope that your 'extra' predictors aren't influenced by the experimental manipulation and that having these extra predictors really helps you to reduce the amount of error in your estimates of the effect of the experimental manipulation. When the manipulation impacts the 'extra' predictors then you can have serious problems with the analysis. In this case there's no experiment and I have no real idea of why or how the demographics influence the vote. However there are huge differences in the demographic variables where the machines are versus where they were not and a priori I don't know what kind of model (whether it's linear, all the right variables are used etc.) you'd use to parse out the 'demographic' effects and then look at only 'machine' effects. To me, given this problem there's not going to be a good answer regarding whether any 'tampering' was done because there's no way to know that you've specified the correct model and accounted for the important demographic differences among the machine counted versus hand counted areas.

Also delete my p-value post above. I wrote that in haste and it's sort of correct but terribly written and confusing. The definition is P(Data|Null Hyp) which should read the probability of the data given the null is true.

Good to see such a fascinating post and interesting discussion.

A LR of 1:1000 isn't all that persuasive -even less so if you consider that the underlying assumption regarding demographic influence may be significantly flawed.

CJ, the p is well below 1:1000 from what I understand; SPSS just doesn't report the pvalues below that (AFAIK). Do you have suggestions on how to fix the "flawed assumption" underlying demographic influence?

"with respect to Hillary Clintons surprise victory in the Democratic Primary and the notable differences across vote tabulation technologies in Clintons and others levels of support, our results are consistent with these differences being due entirely to the fact that New Hampshire wards that use Accuvote optical scan machines typically have voters with different political preferences than wards that use hand counted paper ballots."

Wow, what a surprise.

By truth machine (not verified) on 22 Jan 2008 #permalink

The ease with which Diebold machine votes can be changed has been well documented and remains uncorrected. That certainly makes it possible a Hillary supporter with computer hacking skills could have changed some votes. But scientists have to take care when following the evidence to not turn in a single direction for the conclusion prematurely.

If the analysis shows a discrepancy between Diebold machine counts and other voting system counts, and the other variables were truly controlled for, that is not sufficient to conclude it was the Diebold machine votes which were changed.

While the pre-vote polls favored Obama, there were 2 incidents which preceded the voting and occurred after the polls. One was the teary episode reported unusually positively in the major broadcast media and the other was a couple hundred point fall on the Dow which the broadcast news media announced with its ever growing enthusiasm for the contrived coming disasters they seem to relish reporting. In addition, there were many undecideds in the polls. One cannot conclude the pre-vote polls should have predicted the actual vote.

In addition to the data analysis here, one must also consider where in the system all the data could have been accessed. Were there consistent discrepancies in all Diebold results or were there local discrepancies? Reports come in by district so where are those results tallied? I believe in Ohio in 04 there was a central location machines or tabulating could have been altered by a single person.

If there is no central point of vote tabulating or access for machine tampering, then you have to explain how minions of vote machine hackers were recruited with no whistle blowers in the ranks.

But while one looks for the feasibility of vote tampering to explain the data, and variables overlooked to explain the discrepancy, consider the fact the poll results, with all the undecideds, could have been wrong. Then you may be seeing tampering of the paper ballots, not the machine ballots.

http://www.blackboxvoting.org/ which was responsible for much of the investigation into the Ohio Diebold problems had the following criticisms of the security of the paper ballots in New Hampshire.

http://www.bbvforums.org/cgi-bin/forums/board-auth.cgi?file=/1954/71456…
Holes hidden in plain sight

http://www.bbvforums.org/cgi-bin/forums/board-auth.cgi?file=/1954/71404…
Ballot boxes found slit; NH ...

It appears that adding a couple thousand paper votes for Obama over Hillary is an equally possible explanation for the discrepancy in the data.

I am not suggesting either of these scenarios. I am, however, urging those people who want to follow the evidence to not be misled into the wrong conclusion because on the surface one hypothesis is easier to swallow than the other.

By skeptigirl (not verified) on 22 Jan 2008 #permalink

A new analysis is in from Michael C. Herron, Walter R. Mebane, Jr., & Jonathan N. Wand

http://blog.wired.com/27bstroke6/files/NH2008HMW.pdf

"We find no significant relationship between a wardï¿½s use of vote tabulating technology and the votes or vote shares received by most of the leading candidates who competed in the 2008 New Hampshire Presidential Primaries. Among Clinton, Obama, Edwards, Kucinich and Richardson in the Democratic primary and Giuliani, Huckabee, Paul, Romney and McCain in the Republican primary, we observe a significant average effect of using PBHC technology on the wards that used PBHC technology only in the votes counted for Edwards, and that difference is small. The effects for Edwards also do not appear to be significant when a regression-based bias adjustment is applied.

The particular set of variables used for the matching analysis in this study does not exhaust the range of observable ward attributes. It is possible that another set if matching variables and matched pairs of wards would produce even better balance among observables across technologies than we have found. It is also possible that some observables we have not examined in this study remain imbalanced, contributing to bias in our estimates of the average treatment effects. The observable features we have examined, however, include variables that measure many aspects of 11 the preceding primary elections in the state, as well as many demographic features of wards in the state."

The recount tallies do not reveal any significant discrepancies either.

http://www.sos.nh.gov/recountresults.htm

I wonder from this what will stick in people's minds a year from now? "I hear Hillary cheated on the New Hampshire primary." I'd love to see an analysis of facts vs beliefs over time.

One thing is certain, however. We cannot get rid of paper ballots needed for recounts. That would indeed be a mistake.

By Skeptigirl (not verified) on 24 Jan 2008 #permalink

Skeptigirl & Chris,

I'm a statistician. I went over the Herron/Mebane/Wand matching study carefully and found the following:

1) Their study design doesn't have enough power to detect the difference in question, even at the 90% level.

2) They started with a large number of variables and narrowed the field to 8 without systematic or stepwise procedures.

3) Two of their final model variables are linearly related. One variable (wpkerry04pd) is the standard normal transformation of another (pkerry04pd). I'm surprised they were able to get convergence.

I contacted them asking for clarification and Dr. Mebane responded politely, saying:

"We'll be issuing a revised paper in a few
weeks with some new analysis based on augmented data.". He went on to say their variable selections were based on their research of past NH primaries, and couldn't do much about the power given the small number of matches. I didn't press them further.

Jake, thank you. I didn't want to go out on a limb since I'm not a statistician, but I could tell that their matching estimator was low on power, and that they didn't run a power analysis. I find it a little disturbing that the "experts" are so quick to quell concerns of such importance without (apparently) checking their work with stats 101 methods (e.g., power analysis).