# Theory and Correlation

In quantum mechanics, particles like electrons can be observed in one of two spin states: up or down. The theory, however, doesn’t require the state to be completely determined before we look at it. Any given electron doesn’t have to be in one of those spin eigenstates; it can be in a superposition of spin up and spin down. It’s like Schroedinger’s cat being in a superposition of alive and dead, but less dramatic. For instance, a particular electron may be in a state which has a probability of 60% of being in an up state and 40% of being in the down state. Once measured it will definitely be one or the other, but if you repeat the process with many electrons prepared in the same state, you’ll end up with a 60:40 ratio in what you observe.

There’s a very interesting company on the internet called Intrade, which runs an operation which provides a marketplace for buying and selling shares of current events. If you think Obama will win the election, you can buy a share of “Barack Obama to win 2008 US Presidential Election”, and if he wins you’ll get \$10 a share. If he doesn’t win the person who sold you the share gets to keep whatever you paid him for the share. Sales are done via the usual bid/ask process. Thus the share price (right now \$6.01 a share) represents the probability for the event to happen. As of this writing it’s 60.1% Obama and 36.9% McCain. It doesn’t quite add up to 100%, mostly reflecting the possibility that Obama or McCain may not make it to the election due to death, disease, scandal, or whatever other reason. Exchange fees also account for some fraction of the gap.

You could say the election is in a linear superposition of the Obama and McCain states, and trading exchanges in some sense give you information about the wavefunction of the election. On November 4 the observation will be made and the function will (probably) collapse to the Obama or the McCain eigenstate. Until then we can get a sense of the time evolution of the wavefunction by tracking the share prices. If so, the prices for Obama and McCain should correlate inversely. Let me plot the share prices since July 1 for each. Each data point represents the closing price for a particular day. We expect to see Obama = 100 – McCain. Obama is on the y axis, McCain on the x.

Zoomed out:

We actually see Obama = 87.5 – (0.728)McCain, with a correlation coefficient of 0.859. Given that politics is a lot more complicated than two sharp spin eigenvalues, it’s not so bad.

What does all this mean? Nothing much. We could already have guessed that one candidate’s gain is the other candidate’s loss. But there is that wiggle room at the right and left of the graph that asks for an explanation. If McCain is certain to lose, Obama should have more than an 87% chance of winning, and vice versa. It’s these nonsensical gaps in theory that can lead to discovery – or the conclusion that the data source is simply flawed. The uncertainty in the slope and intercepts are quite small (0.046 and 1.526 respectively), so the error is very likely systematic rather than fuzziness in the data.

Any suggestions? Is there some interesting effect that causes the wiggle room, or is market trading just not equipped to describe the situation adequately near the margins?

1. #1 Bob O'H
August 13, 2008

Are you aware that you’re downwardly biasing the slope estimate, by using linear regression, are you? Here that would lead to an under-estimate of the slope, and also then of the intercept. The problem is that both probabilities are subject to error (or at least stochastic fluctuation), but you’re assuming that the McCain data is perfect.

Terms like “error in variables” and “regression towards the mean” tend to get thrown around. There are better techniques to handle this data, such as reduced major axis regression, but I’m not an expert on these.

2. #2 ScentOfViolets
August 13, 2008

I don’t think that is necessarily true, Bob. If the linear model is a better fit than any higher-order polynomial, for example. Agreed about the abuse of terminology though.

3. #3 Matt Springer
August 13, 2008

Bob, the McCain data is perfect. So is the Obama data. The data is exactly the closing prices for each day’s trading. The questions that have me curious is to why the corelation isn’t therefore exact, and furthermore why the slope isn’t 1 given the fact that the events are mutually exclusive and pretty much span the space of possibilities. It looks like what could be a classic arbitrage opportunity for someone (not me) who’s willing to play the markets.

4. #4 Eric Lund
August 13, 2008

It doesn’t surprise me that the intercept is less than 100. As you point out in the post, the exchange has to charge some fee (in the form of percentages adding up to less than 100%) to fund their operations, and there is also a nonzero (but we hope very small) probability that one or both candidates could be forced out due to scandal, etc. The puzzler for me is why the slope is that small. Bob is correct to point out that you may have neglected the errors in your ordinate, but it’s not obvious to me that that will make up all of the difference (although it is clear that the error in the computed slope is larger than what you think it is).

Another problem with your linear regression (I think you tried to say this but did not say it clearly) is that Obama’s probability does not drop to 0 before McCain’s reaches 100%. Again, this suggests that your slope may be too small.

One other possibility you didn’t mention: the effect of third party candidates (Nader, Barr) on the race. That would also push the intercept down compared to a two-person race if we assume a linear fit, but more likely they would introduce higher-order terms which become noticeable only near the fringes. For example, as McCain’s chances approach 100% some voters who otherwise would have gone for Obama might jump to Barr instead. Since you don’t have data near the fringes, these higher order terms are probably smaller in magnitude than the error bar you would get from attempting to fit a higher order polynomial on this data set.

5. #5 Matt Springer
August 13, 2008

Another problem with your linear regression (I think you tried to say this but did not say it clearly) is that Obama’s probability does not drop to 0 before McCain’s reaches 100%…

One other possibility you didn’t mention: the effect of third party candidates (Nader, Barr) on the race. That would also push the intercept down compared to a two-person race if we assume a linear fit, but more likely they would introduce higher-order terms which become noticeable only near the fringes…

Yes, that’s exactly the problem I was trying to point out. The zero for one caididate does not correspond to the 100 of the other. And your suggestion about third-party candidates is precisely the kind of explanation I was hoping for. It makes a lot of sense.

Either way I’m sure you and others are right that the error is higher than the simple error in the regression. I just provided the raw “sigma m” and “sigma b” values straight from the spreadhseet.

6. #6 DaleP
August 13, 2008

In extrapolating to the margins, you are asking, why doesn’t the linear estimate meet the boundary conditions. In many physics situations, a linear function is useful. Here, we don’t know, and I would guess that the function as it approaches the margins is non-linear. So, the function actually would reach 100-0 or 0-100 endpoints.

August 13, 2008

Wait, what the point of mentioning QM in this post? It has nothing at all to do with QM. And frankly you should know better than to say things like “the wavefunction of the election.”

Please don’t force some physics into posts that aren’t really about physics. Its perfectly okay that they all aren’t about physics, presumably you have other interests too! The site you link to is interesting enough to post about without needed some pseudoscience babble.

8. #8 Bob O'H
August 13, 2008

Bob, the McCain data is perfect. So is the Obama data.

But there is still stochastic fluctuation, and you are assuming that it is only in the Obama data. You’re simply fitting the wrong line, one which we know will tend to have a slope that it too low. I don’t know if fitting the correct line will correct back to 100 (probably not: plot the 1:1 line, and you see that most of the points are below it), but it will be closer.

9. #9 Uncle Al
August 13, 2008

Given a model, what is being modeled? “I’d rather elect Obama and hope some of what hes telling is the truth than elect McCain and hope to God 100% of what he says is lies.” There is feedback. The linear regression model should break into a higher order fit, with curvature, as 04 November approaches.

10. #10 Steven N. Severinghaus
August 13, 2008

As of this writing it’s 60.1% Obama and 36.9% McCain. It doesn’t quite add up to 100%, mostly reflecting the possibility that Obama or McCain may not make it to the election due to death, disease, scandal, or whatever other reason.

Isn’t that missing 4.1% just Clinton?

11. #11 Matt Springer
August 13, 2008

#7, it’s a just a goofy metaphor. I trust my readers enough to believe they know that the election isn’t really something that you can square the modulus of to find a probability density. ๐

The point of the comparison to spin is that you have two exclusive possibilties, and if you know the probability for spin up, the probability for spin down is 1 – (spin up). I’d naively expect a similar linear relation to obtain for this binary situation, but it doesn’t. Not knowing much about trading or the statistics thereof, I figured I’d ask the community.

#8, that sounds right. While both sets of data are from the exact prices, whatever fluctuations exist will exist for both. If I get some time I’ll run the analysis again with major axis regression.

12. #12 Matt Springer
August 13, 2008

#10, She’s still at 4%? Wow, I had totally overlooked that. In fact that puts the total probability at over 100%. Even more strange, her probability of merely winning the Dem nomination is only trading at about 3%.

Surely she’s not thinking about switching parties. Or what if McCain nominated her as his VP and then he had to drop out due to illness. The mind boggles…

13. #13 CCPhysicist
August 13, 2008

I haven’t looked recently, but I’m pretty sure there are more than two plays in that market. The remaining fraction is probably a bet that Hillary will win a floor fight at the convention in a week or so. The PUMAs who would rather vote for a right to life conservative than not-Hillary probably make that market. Others might be betting on the definition of “native born”, since a similar situation to that of McCain was considered problematical for George Romney in 1968 and never resolved.

As for the clustering, all markets are dependent on information and there is a lot of information available. In this case, the traders certainly know about

http://www.fivethirtyeight.com/

which uses a fairly sophisticated Monte Carlo method for modeling the election based on the imperfect data of a large number of polls done at different times with different methodology. The biggest cluster is around the current 65% win probability.

Note that neither of these are predictions in the political science sense. Both are an attempt to get a better estimator of the true state of the electorate today.

14. #14 CCPhysicist
August 13, 2008

Oh, yes, and you might notice that he shows the probability distribution from the simulation, not just the final numbers.

However, these are all based on real probability amplitudes so you can’t get QM-like interference effects.

15. #15 Matt Springer
August 13, 2008

CCPhysicist, that site is really neat. It’s definitely getting added to the list of sites I visit at least weekly. Last election I developed a definite preference for purely numerical estimates and predictions. The punditry is too colored by wishful thinking and personal bias to be an honest estimation.

16. #16 EastwoodDC
August 13, 2008

I have to question the assumption of a linear relation between share price and probability of winning. It would be more interesting to plot y (probability Obama wins) as a function of share price(s). This may be reasonably linear in the range of the data, but a logit(y) tranform – logistic regression – may be appropriate.

17. #17 CCPhysicist
August 13, 2008

I visit it daily.

The fun part is whether this will look like Dewey Beats Truman due to the cell phone problem. (Back then it was simply a telephone problem. Democrats couldn’t afford a telephone.) I’d guess that half or more of my students have no land line that can be used to reach them for polling, or anything else.

The two main differences are that Nate has no investment in a particular poll (each network and/or newspaper has linked up with a particular polling company) and that he looks at the states individually. His expertise was honed with baseball statistics, but that is a bit more stable than polling. Well, there is a third difference: those “pundits” are all paid shills trying to spin the news.

He also tells you whether you are going to see a lot of ads or not. You aren’t going to see any in Texas unless they run nationally. Others (Ohio, Virginia) will get hammered so hard by TV ads that what they are getting now will look quiet by comparison.

18. #18 Rhett Allain
August 13, 2008

Very enjoyable article. However – one key issue. The 60% for Obama is not the probability – it is the estimated probability. If Obama vs. Mcain had been “run” many times and obtained a 60-40 percent, then perhaps it could be called the probability. Otherwise, I have no complaints and enjoyed the analogy.

19. #19 travc
August 13, 2008

collapse to the Obama or the McCain eigenstate
No no no no… We just become entangled with the Obama/McCain system by measurement. The real question is whether it is already determined and we are just stuck with negative mutual information until Nov, or if the universe will fork ๐

20. #20 Paul Murray
August 14, 2008

Whwn working with probabilities, an interesting view is not working with p, but with p/(~p) where ~p = 1-p. What makes it interesting is that it provides a reasonable meaning of the idea of “doubling” or “halving” a probability. If you plot it logarithmically, the resulting curve p’=ln(p/~p)is pretty and has a nice inverse p = e^p’ / (e^’p + 1).

21. #21 Branedy
August 14, 2008

Just an FYI, the data from Intrade does get ‘fuzzy’ at the extremes at ~90% and above and ~10% and below the numbers tend to go soft subject to analog noise. Call it Heisenberg uncertainty principle or quantum foam.

22. #22 sammler
August 19, 2008

The “closing price” is almost certainly the last traded price on that day. Suppose the market were perfect, and the value of the two contracts always added up to 100%, but the last McCain trade of the day came 10 minutes before the last Obama trade. Then we would see an apparent decorrelation in the closing prices.

This synchrony problem is why high-frequency algorithmic trading requires such a big investment in infrastructure.

23. #23 Baktru
December 5, 2010

It doesn’t take into account a fundamental factor of prices of anything traded in a financial market. ALL investments are expected to rate the same return and there is a fixed return you can measure against, the interest rate.

If half a year before election, Obama is seen as having a 60% chance of winning the election, then the expected return is 6\$ in 6-months time. 6\$ in 6 months time has less value than 6\$ now, hence the value of the share would be lower than 6\$.

The sum of the trading prices of the two shares should hence never be over 10\$, and if they do, you should massively sell them until they actually do, both of them.

Easy enough to see why. You sell one of each share (worth more than 10\$) together. After the election, you have to pay 10\$, and keep whatever the sum was worth more. Arbitrage ๐

Now say that you have 1000\$ ready to invest, and those could go into a 6-month bond that give you a 3% interest, or you will have 1030\$ in 6 months. In that case you would expect the sum of Obama + McCain to be at 9.7\$, 3% below the 10\$ mark, as an investment in these shares should once again have the same yield as flat interest rates. Better yet, if the current value of them is BELOW 9.7\$, and you can borrow money at 3%, then you should borrow x amount, buy an equal amount of Obama and McCain shares and you will make a profit in the end. Interest arbitrage again.