This is all standard physics. Consider the two-slit experiment–a light beam, two slits, and a screen–with y being the place on the screen that lights up. For simplicity, think of the screen as one-dimensional. So y is a continuous random variable.
Consider four experiments:
1. Slit 1 is open, slit 2 is closed. Shine light through the slit and observe where the screen lights up. Or shoot photons through one at a time, it doesn’t matter. Either way you get a distribution, which we can call p1(y).
2. Slit 1 is closed, slit 2 is open. Same thing. Now we get p2(y).
3. Both slits are open. Now we get p3(y).
4. Now run experiment 3 with detectors at the slits. You’ll find out which slit each photon goes through. Call the slit x. So x is a discrete random variable taking on two possible values, 1 or 2. Assuming the experiment has been set up symmetrically, you’ll find that Pr(x=1) = Pr(x=2) = 1/2.
You can also record y, thus you can get p4(y), and you can also observe the conditional distributions, p4(y|x=1) and p4(y|x=2). You’ll find that p4(y|x=1) = p1(y) and p4(y|x=2) = p2(y). You’ll also find that p4(y) = (1/2) p1(y) + (1/2) p2(y). So far, so good.
The problem is that p4 is not the same as p3. Heisenberg’s uncertainty principle: putting detectors at the slits changes the distribution of the hits on the screen.
This violates the laws of conditional probability, in which you have random variables x and y, and in which p(x|y) is the distribution of x if you observe y, p(y|x) is the distribution of y if you observe x, and so forth.
A dissenting argument (that doesn’t convince me)
To complicate matters, Bill Jefferys writes:
As to the two slit experiment, it all depends on how you look at it. Leslie Ballentine wrote an article a number of years ago in The American Journal of Physics, in which he showed that conditional probability can indeed be used to analyze the two slit experiment. You just have to do it the right way.
I looked at the Ballentine article and I’m not convinced. Basically he’s saying that the reasoning above isn’t a correct application of probability theory because you should really be conditioning on all information, which in this case includes the fact that you measured or did not measure a slit. I don’t buy this argument. If the probability distribution changes when you condition on a measurement, this doesn’t really seem to be classical “Boltzmannian” probability to me.
In standard probability theory, the whole idea of conditioning is that you have a single joint distribution sitting out there–possibly there are parts that are unobserved or even unobservable (as in much of psychometrics)–but you can treat it as a fixed object that you can observe through conditioning (the six blind men and the elephant). Once you abandon the idea of a single joint distribution, I think you’ve moved beyond conditional probability as we usually know it.
And so I think I’m justified in pointing out that the laws of conditional probability are false. This is not a new point with me–I learned it in college, and obviously the ideas go back to the founders of quantum mechanics. But not everyone in statistics knows about this example, so I thought it would be useful to lay it out.
What I don’t know are whether there are any practical uses to this idea in statistics, outside of quantum physics. For example, would it make sense to use “two-slit-type” models in psychometrics, to capture the idea that asking one question affects the response to others? I just don’t know.
See the comments here for much more discussion.