Built on Facts

E Pluribus Unum, Particle Style

So Seed magazine has endorsed Obama. Quelle surprise! I suppose I shouldn’t bite the hand that feeds me, but of course I’m on record as supporting the “anyone else” ticket. I am under no illusion that it will be anything but a lost cause.

One of the things that leads me to believe this is poll data. Poll data is sort of the sociological version of many-body theory.

In physics, “many” often has a particular meaning. According to a guest lecturer we had today who works with semiconductor lasers, “many” means “more than two”. He’s working with the difficult problem of theoretically describing how the interactions between the particles in the laser diode affect the lasing process. One isolated atom undergoing stimulated emission isn’t usually too hard to describe, but once they start bumping into each other and filling bandgaps and undergoing exchange forces and doing all kinds of other horribly promiscuous things with each other… well, you try computing a Hamiltonian for that. You have to be clever and start to take into account the statistical behavior of the system as a whole and not try to keep track of each individual atom. As you might expect, this leads to its own difficulties.

In more basic physics, such as the gravitational interaction between planets, we have the same problem. Treating the motion of just one object in a gravitation field is easy. Take the space shuttle and the earth for example. Assume the earth is so big the shuttle’s gravity doesn’t affect it (a good assumption!) and one equation does the job for you. That’s a single body problem. Now if you have two bodies of comparable size like the earth and the moon, the problem is a little harder but still exactly solvable. You use a trick called the reduced mass and the problem reduces to the one-body problem just like for the shuttle.

Go to three bodies interacting gravitationally and suddenly you find yourself completely and comprehensively hosed. The problem isn’t merely more difficult, it’s absolutely impossible to do exactly. There’s no single equation where you can just plug in some time t and find out where all the planets are. Now there is a very slowly converging series solution, and of course numeric methods are capable of high precision for fairly small numbers of interacting bodies over reasonable time scales (such as the solar system over a few thousand years). But generally speaking there’s no closed form equation which solves the problem. Worse still, many interesting problems like galactic dynamics and the universe as a whole can contain billions of interacting objects. So you have to be clever with your statistics.

For example, in a gas containing trillions of trillions of molecules, PV = nRT can model the aggregate behavior of the system quite well. Some thermodynamics is all you need to derive the equation. Galaxy dynamics are harder, but they can be done on a good computer using some approximations and a few hours (well, lots of hours) of CPU time. Still, it’s easier than manually simulating every single one of the billions of stars.

It’s why polls sample a few hundred or a few thousand people, when millions are voting. It’s not perfect, but the many-body problem is too hard to do more than once, on election day.


  1. #1 razib
    October 30, 2008

    good call. reality is what it is. and we need ways to discern it. it’s pathetic how some conservatives have been cherry picking the tightest polls and yanking things out of context. reminds me of the sad crap that you saw re: polls in the last month of the bush-kerry race (check out ruy teixeira’s blog archives for oct 2004 for plausible if ultimately wrong arguments). the key point is that even if the polls are wrong some of the time, you need to keep trusting them if they are right most of the time. confidence intervals are there to tell us that nothing is assured, but things may be probable.

  2. #2 Thrasymachus
    October 30, 2008

    Razib is correct about the danger of cherry-picking polls.

    One thing that is interesting to me, is that your error bars aren’t affected by the fraction of your sample size compared to the total size, they are affected only by the sample size.

    To save money, there is a certain amount of black magic that goes on in polling. Polls are re-normed to estimate numbers of likely voters, Republicans versus Democrats, etc., etc.

    Due to Obama’s non-traditional base of support and fund raising, I expect that the error bars are bigger this year than they have been in years past. However, they aren’t 5-6 percentage points big, which is what McCain would need to win.

  3. #3 razib
    October 30, 2008

    re: weighting. i’ve seen right bloggers be skeptical about the “expanded” likely voter model. i’ve then had to point out that even the most conservative pollsters who make 2004 turnout assumptions consistently show obama ahead, even if only by 1-3 points! what’s the point of cherry-picking & expressing skepticism if you can’t even come out on top? it doesn’t matter how much you win by, it matters if you win. if you assume normal levels of black turn out, and that republicans and democrats come out in even numbers, and mccain still loses, you need to shift your expectations. of course mccain could still win. but it will probably be due an exogenous shock which the polls obviously won’t account for naturally.

  4. #4 Uncle Al
    October 30, 2008

    As Election Day nears only three classes of sufferaging people would vote for McCain/Palin: prospective contractors, folks with eyeholes in their percale sheets, and snake-handling glossolaliacs. That can’t sum to more than 45% of the participatory electorate.

  5. #5 CCPhysicist
    October 30, 2008

    What, no one commenting on the restricted 3 body problem?

    After all, one potential flaw in some of the polls is that Bob Barr might alter the vote distribution in a state like Montana, drawing conservatives who are put off by associations with secessionists in Alaska, for example.

    Anyway, the physics of the many-body problem is an interesting challenge. It does start right after 2 (apart from the restricted 3-body problem), but has proven amenable to direct solution in nuclear physics if you have the computers needed to push N up to double digits.

    But it is the really-many body problem that is of relevance when looking at polls and sampling – and sampling polls. The key analytic tool used by Nate at 538 is to run a Monte Carlo simulation of many elections, using the poll uncertainties to generate many different independent voting distributions. It is still no better than the input data (GIGO always applies), but he weights polls by their past performance like he does in his baseball analyses.

New comments have been disabled.