# The laws of conditional probability are false

This is all standard physics. Consider the two-slit experiment–a light beam, two slits, and a screen–with y being the place on the screen that lights up. For simplicity, think of the screen as one-dimensional. So y is a continuous random variable.

Consider four experiments:

1. Slit 1 is open, slit 2 is closed. Shine light through the slit and observe where the screen lights up. Or shoot photons through one at a time, it doesn’t matter. Either way you get a distribution, which we can call p1(y).

2. Slit 1 is closed, slit 2 is open. Same thing. Now we get p2(y).

3. Both slits are open. Now we get p3(y).

4. Now run experiment 3 with detectors at the slits. You’ll find out which slit each photon goes through. Call the slit x. So x is a discrete random variable taking on two possible values, 1 or 2. Assuming the experiment has been set up symmetrically, you’ll find that Pr(x=1) = Pr(x=2) = 1/2.

You can also record y, thus you can get p4(y), and you can also observe the conditional distributions, p4(y|x=1) and p4(y|x=2). You’ll find that p4(y|x=1) = p1(y) and p4(y|x=2) = p2(y). You’ll also find that p4(y) = (1/2) p1(y) + (1/2) p2(y). So far, so good.

The problem is that p4 is not the same as p3. Heisenberg’s uncertainty principle: putting detectors at the slits changes the distribution of the hits on the screen.

This violates the laws of conditional probability, in which you have random variables x and y, and in which p(x|y) is the distribution of x if you observe y, p(y|x) is the distribution of y if you observe x, and so forth.

A dissenting argument (that doesn’t convince me)

To complicate matters, Bill Jefferys writes:

As to the two slit experiment, it all depends on how you look at it. Leslie Ballentine wrote an article a number of years ago in The American Journal of Physics, in which he showed that conditional probability can indeed be used to analyze the two slit experiment. You just have to do it the right way.

I looked at the Ballentine article and I’m not convinced. Basically he’s saying that the reasoning above isn’t a correct application of probability theory because you should really be conditioning on all information, which in this case includes the fact that you measured or did not measure a slit. I don’t buy this argument. If the probability distribution changes when you condition on a measurement, this doesn’t really seem to be classical “Boltzmannian” probability to me.

In standard probability theory, the whole idea of conditioning is that you have a single joint distribution sitting out there–possibly there are parts that are unobserved or even unobservable (as in much of psychometrics)–but you can treat it as a fixed object that you can observe through conditioning (the six blind men and the elephant). Once you abandon the idea of a single joint distribution, I think you’ve moved beyond conditional probability as we usually know it.

And so I think I’m justified in pointing out that the laws of conditional probability are false. This is not a new point with me–I learned it in college, and obviously the ideas go back to the founders of quantum mechanics. But not everyone in statistics knows about this example, so I thought it would be useful to lay it out.

What I don’t know are whether there are any practical uses to this idea in statistics, outside of quantum physics. For example, would it make sense to use “two-slit-type” models in psychometrics, to capture the idea that asking one question affects the response to others? I just don’t know.

See the comments here for much more discussion.

1. #1 Larry Wasserman
November 26, 2009

Hi Andrew

I am going to repeat what I said on your other blog.
There is no problem here.
See:

“Consistent Quantum Theory” by Robert Griffiths
or

http://en.wikipedia.org/wiki/Consistent_histories

Best wishes
Larry

2. #2 Andrew Gelman
November 26, 2009

Larry:

As I wrote here, when you say I “just have to reason more carefully,” I think you mean that if I include more information in the joint distribution, I can set this up as a coherent probability model. That’s fine, but then you’re going beyond the usual way we model things probabilistically. If you have to add a new random variable every time you condition on a measurement, I don’t think of this as being the same sort of probability theory that we usually use. I’d call this a generalization of classical probability theory, which is my point: classical probability theory (which we use all the time in poli sci, econ, psychometrics, astronomy, etc) needs to be generalized to apply to quantum mechanics. Which makes me wonder if it should be generalized for other applications too.

And this is definitely related to Heisenberg’s uncertainty principle–my point was not that it was exactly the same but that it was the same concept (as discussed in Malcom’s comment). The two-slit experiment is just a simple tabletop-experiment way to focus on the key issues here.

3. #3 Andrew Gelman
November 26, 2009

P.S. To put it another way, there’s certainly no problem here, as you say, if you model things directly (using the appropriate quantum distribution). But the natural way to model the problem–via a joint distribution of x and y–will not work. Instead, you need to work with a more complicated formulation using probability amplitudes or whatever. Which makes me wonder about other problems where the very natural-seeming model doesn’t do what it is supposed to, because of interaction between measurement and observation.

4. #4 Daniel Lakeland
November 26, 2009

The important thing to remember is that measurement CHANGES the system. So you can’t use your model of conditional probability because you really *don’t* have the same likelihood in the two situations.

“If the probability distribution changes when you condition on a measurement, this doesn’t really seem to be classical “Boltzmannian” probability to me.”

It’s not so much that it changes when you *condition* on a measurement, but it changes when you *make* the measurement.

Set up the following experiment. Two slits, with a detector at the slits. You measure which slit the electron goes through but you don’t condition on it (throw away the info). This gives you p(x) the probability that an electron hits the detector at location x. Now you’ll see that the probabilities are consistent. p(x) is a mixture distribution of p1(x) and p2(x) the distribution for each slit.

The thing is to make the measurement you need to hit the electrons with a photon, and doing that changes which experiment you’re running vs not hitting the electron with the photon.

5. #5 Andrew Gelman
November 26, 2009

Exactly. Quantum probability is more complicated than classical probability (the sorts of models we use routinely) because in quantum probability there’s no such thing as “conditioning” in the sense of classical conditional probability. Taking a measurement is an action of its own. We don’t usually have that in our model classically.

6. #6 Larry Wasserman
November 27, 2009

But as Griffiths shows, taking a measurement is not an action.
You don’t need to add any extra random variables.
I suggest reading his book (or his papers).
His papers are very clear.
Some people refer to his work as
“Copenhagen done right.”
His approach clears up many misunderstandings.
–Larry