I promise this is not a politics post. It just uses some vote totals for some fun math!
The Minnesota senate race has so far a total of 2,422,811 votes between the two leading candidates. The margin separating them is 477. It's about as close to a 50:50 split as we've seen this cycle. Probably in the last several cycles.
Now let's say you have a perfectly fair coin that has a probability of 1/2 of landing heads and 1/2 of landing tails. For any given string of coin tosses you won't expect to have an exactly even number of heads and tails. flip the coin 10 times and you wouldn't be totally shocked to get something like 8 heads and 2 tails. In fact, the probability is a little over 4%. The way we know this is by invoking the binomial distribution. Here is is, specialized for the case of p = 0.5.
That gives you the probability of flipping k heads in n flips. For fairly small n and k this works fine. But let n be more than two million as in this case and those factorials become intractable. There's some approximations that can help, but those only do so much before they get swamped too. And we're going to be interested in computing standard deviations which means summing over all the possibilities... no way in the world. Too hard. We need a better approximation.
Fortunately as n gets large the binomial distribution gets closer and closer to the normal distribution in shape. In particular for large n, the binomial distribution looks like a normal distribution with a mean as expected of np. The standard deviation is np(1-p). And with a tiny bit of calculus that's pretty easy to work with.
But we don't have to work with it, we just want to standard deviation. That standard deviation is a measurement of the average spread, so to speak. Flip a coin ten million times and you expect to get five million heads on average. But you would be very surprised to get five million exactly. In practice there will be some spread. The spread is less than the standard deviation about 2/3 of the time. So for that million coin flip, most of the time the number of heads will be within (it turns out) 1581 of that expected 5,000,000 number. The larger the number of flips the larger the spread, but the smaller the spread is as a percentage of the total number.
For the number of votes in the Minnesota contest, the spread would be 778. So in other words if every single voter picked a candidate a random with a truly 50:50 chance there would on average be about a 778 vote gap between the candidates. Yet in the actual election the spread is even closer than that. Such a close election wouldn't be very likely even if the electorate were truly exactly evenly split.
- Log in to post comments
I think you're a bit off with standard deviations here. Just because the election spread is less than the standard deviation doesn't mean it is any less likely. Any spread within a standard deviation is highly likely, with exactly 50:50 being most likely. So a vote split of 477 isn't unlikely at all.
The standard deviation isSqrt(np(1-p))
I thought you said you weren't going to do politics again today?
Chris P
Actually, it's a 42:42:15 split. From a purely mathematical point of view, I can't see any objection to simply ignoring that 15%, but if you want to derive real-world conclusions about voters' preferences, it's clearly important.
In my congressional district (VA 05), the vote for the House of Representives was even closer. Periello (D) received 157,460 votes while Goode (R) received 157,407. There were 315,044 total votes cast, meaning that there were more write-in votes (177) than the difference between the two candidates (53).
(All my numbers are from https://www.voterinfo.sbe.virginia.gov/election/ )
A press article gives 2,422,809 votes between the two. If the chance of getting any particular vote is 50%, then from one the candidate's perspective, he should get 1,211,404.5 votes with a standard deviation of Sqrt(np(1-p)), or 778. One candidate got +0.305 sigma and the other got -0.305 sigma. The probability of a normally distributed random variable being between -0.305 and +0.305 standard deviations is about 24%.
Re: Ian and Tim, the main thing is that since the vote is within the standard deviation for a 50:50 split then for all practical purposes the outcome will be random chance. The winner will win not because he convinced a higher percentage of the electorate but because those coin flips just happened to go in his favor.
Thad, you're right. It should be variance in that equation in the main post.
Fortunately as n gets large the binomial distribution gets closer and closer to the normal distribution in shape. In particular for large n, the binomial distribution looks like a normal distribution with a mean as expected of np. The standard deviation is np(1-p).
I don't think the binomial/normal approximation is buying you anything you need. These facts hold for the binomial case to begin with, don't they?
But you're equating voters with fair coins in this instance and voters are anything but fairly arbitrary.
In this case you should probably be comparing the 2008 results with Coleman's 2002 results, then compare expected values and standard deviations. If Coleman won in a landslide in 2002 and he wins in a squeaker today, I would argue that a high percentage of the electorate has in fact changed its mind. However, if both totals are within the variance - ie, it was a close race both times - you could have a better chance to argue that Franklin was awarded the winner purely due to statistical noise.
The voters are definitely not random, but then neither is a fair coin. Voters go the polls and coins follow Newton's laws for predetermined results. But in the statistical aggregate the result is described as though it were random.
The electorate seems to be pretty much exactly evenly split, and so it just came down to which side had the bad luck of having more people with the flu, or flat tires, or whatever else sent them toward or away from the polls.
The voters are definitely not random, but then neither is a fair coin.
When you toss a fair coin a million times, you tend to get approximately 50% heads, and if you repeat this process many, many times you get a binomial distribution for the ensemble.
- It is true that polling many, many voters gives SOME distribution, but I doubt binomial is the way it goes
- More important, poll one generic person in the electorate a million times and you don't get about 50% D and 50% R. It depends on the voter - among democrats p(R) is tiny while among republicans p(D) is tiny. There simply isn't a single underlying p(D) the way there is a single underlying p(H) or p(T), rather there is some underlying distribution of p(D) or p(R)
- Finally, these underlying distributions are ever changing, particularly for swing voters, and the changes are *highly* correlated among groups of voters. I really don't think this is a useful model for understanding how often small victory margins are to be found
"I don't think the binomial/normal approximation is buying you anything you need. These facts hold for the binomial case to begin with, don't they?"
Variance is still np(1-p)under a Bin case. Normal approx justifies him saying the stuff about most stuff landing within one sd 2/3rds the time.
Think one interesting idea for this post would be if anyone had any figures on error rates when counting votes. For a normal distribution of errors whats the probability the losing side actually won but lost out due to errors?
Counting errors certainly do happen--remember the 2004 WA governor's race, in which Rossi was ahead after the initial count but the recount flipped the race to Gregoire. The error rate depends on what kind of equipment is used: touch screens historically have had high error rates (which is one reason why they are banned in New Hampshire, where I live), optical scanners are generally but not always somewhat more accurate, and hand counted paper ballots are also somewhat more accurate.
In Minnesota, as in many other states, recounts are mandatory whenever the winning margin is less than 0.5% of votes cast. In this case that threshold is more than an order of magnitude higher than the binomial standard deviation. The actual statistical error should be smaller than the legal threshold by at least a factor of 3 (and preferably much more) simply because you want the erroneously flipped election to be a multiple-sigma outlier.
Nice topic Matt, but your spread probably overestimates the total variability. This is because "random voting" situation is not realistic, as the majority of votes are fixed (counted correctly and no error) with some minority subject to essentially random error. If we allow that 10% of the total votes are "random" and subject to error as you say, then the spread would be 246.
I agree with you this still a very close race, StarTribune.com currently has the margin at 221 votes.
i read in the st paul paper that the scantron machines used to read the votes is about 1 in 1000.
for 2.4M votes, this is about 2400 votes scanned incorrectly.
the vote separation is more like 200 today, which is an order of magnitude less than the expected noise.
both candidates are tied within the accepted error rate of vote counting.
a coin toss would be a fair and cheap method to determine the winner.
Heh. I have maintained that one should be allowed to vote against a candidate as well as for one. Another words, my vote could have a value of 1, or -1. This would be very handy in those races where there isn't an appealing candidate, just the lesser of two evils.
Now we can discuss what happens to candidates who end up w/ negative tallies... :D