If I say that X has probability p, what does that mean? What sort of thing is X, and what does the number p represent?
Philosophers have spilled a lot of ink on this question, with no clear answer emerging. Instead there are a handful of major schools of thought on the issue. Each school captures an important aspect of what we mean when we talk about probability, but none seems to provide a comprehensive account.
One possibility is the so-called classical interpretation. It is classical because it shows up in the earliest formal treatments of probability, for example in the work of Pascal. Laplace was probably the most famous adherent of this view. We imagine an exhaustive, finite list of all possible outcomes from a particular experimental set-up. In the absence of specific evidence to the contrary, we assign each of these elementary outcomes the same probability. The probability of an event is then defined as the number of “favorable outcomes” divided by the number of possible outcomes.
This interpretation is well-suited to simple examples drawn from gambling. The probability of rolling a perfect square with one roll of a fair die is 2/6 since there are six, equally likely, possible outcomes and only two of them are perfect squares. The probability of drawing a heart out of a well-shuffled deck of cards is 13/52, because each of the 52 cards is as likely as any other and 13 of them are hearts. This is the interpretation usually presented in elementary textbooks on probability.
Thus, in this view X is an event in a well defined sample space and p then records an objectively measured ratio.
For these simple examples the classical interpretation works very well. Sadly, it is far too limited to serve as a general account of probability. As it stands it is evidently inappropriate for infinite sample spaces, for example. It also raises the question of how we know equally probable events when we see them. More complex situations are not so readily modelled in terms of finite lists of equally probable events.
An alternative is the frequentist interpretation. The basic insight here is that probabilities are things you measure from long series of trials of repeatable experiments. As with the classical interpretation, this is an objective view of probability. That is, probabilities exist out there, in the real world, and not just in the human mind.
The actual definition of probability in this interpretation will depend on the specific sort of frequentism you are discussing. In finite frequentism we assume that we have the results of a sequence of experimental trials in front of us. Then the probability of any particular outcome is the measured frequency of that outcome among all of the trials. A somewhat more complex version argues that probabilities should be seen as limiting frequencies obtained in an imagined infinite sequence of trials. Regardless, the law of large numbers is crucial here. Without trying to state it as a formal theorem, we can say that in essence it gives us reason to believe that measured frequencies will fluctuate less and less as the number of trials gets larger and larger.
In this view X is an outcome of a repeatable experiment, and p is a quantity you measure based on actual data.
The benfits of this interpretation are its concreteness and its objectivity. It is similar in certain ways to the classical interpretation, in that it assigns an equal weight to all of the events in a certain set. The main difference between them is that the classical view begins with an enumeration of hypothetically possible outcomes, while the frequentist view only examines actual outcomes obtained in an actual data set.
That there is a relationship between measured frequencies and probabilities is obvious to any gambler. That notwithstanding, the frequentist view of things suffers from certain difficulties that rule it out as a comprehensive view of probability. One obvious problem is that it commits us to the idea that probabilities are rational numbers (in its finite form, at any rate), which seems indequate for certain problems in modern physics. There is also the problem of knowing how many trials you need to carry out before you can be confident that your measured frequency is telling you something about the world, and is not telling you simply that something really improbable has happened. There are also conceptual problems inherent in the very idea of talking about an infinite seuqence of trials.
A third major school of thought is the Bayesian interpretation. This is really a family of interpretations united by the idea that probabilities are subjective. A probability is the degree of belief held by a person with regard to a specific proposition. I might assess the probability of getting heads in one toss of fair coin to be one half based on my general knowledge of coins and my lack of knowledge concerning the possibility that the coin in front of me is loaded in some way. If a sequence of coin tosses is now carried out, I would view the data produced as new information with which I would update my prior beliefs regarding the fairness of the coin.
As the name suggests, Bayes’ Theorem plays a major role here. The main idea is that there are only subjective assessments of probability that get updated as new information comes in. Bayes’ Theorem is then the primary tool for updating things rationally.
Bayesianism is especially well-suited to problems in decision theory. A person is confronted with a decision that must be made in the absence of certain relevant information. Part of thinking rationally about such situations is assigning probabilities to outcomes based on the information you have. A simple example is a juror deciding whether to convict a defendant. He might say, I am ninety percent certain the suspect is guilty. Plainly he is talking about his degree of belief based on the evidence presented, and not on the percentage of times the suspect is guilty in a long run of something or other.
Probability as degree of belief is certain an important notion, but Bayesianism has problems as well. First, how ought we assign numbers to degrees of belief? In this interpretation, X represents a proposition. But what is p? The usual idea is that you assign probabilities based on your willingness to bet a certain amount of money on the outcome. You would assign a probability of one half to an event if you would be willing to pay fifty cents for a bet that pays you one dollar if the event occurs, and nothing if it doesn’t. But this makes probability dependent not just on the information a person has, but also on his wants and desires. That doesn’t seem right.
Furthermore, it just seems wrong to propose, as a universal truth, that all probabilities are subjective levels of confidence in propositions. The tendency of various randomizing devices to produce stable long term frequencies is as much a fact about physical reality as anything else scientists study. Probability language seems like the natural device for discussing such realities. An interpretation that leaves no room for such ideas seems a bit impoverished, to put it kindly.
What is going on here? Chance and randomness are ubiquitous in everyday life. Probability theory has been applied with great success in virtually every major branch of science. Can its foundations really be as shaky as my very brief overview suggests? For that matter, what does it even mean to “interpret” probability?
From the viewpoint of pure mathematics there is nothing to interpret. Things like probability spaces and probability measures are just abstract constructs, defined entirely by their axioms. Kolmogorov, writing in the 1930′s, provided the axiomatization of probability that is nearly universally accepted today. As long as the things you are studying satisfy his axioms, you can plausibly claim to be doing probability. Things like Bayes’ Theorem or the Central Limit Theorem are just statements that follow as a matter of logic from the axioms.
The price you pay for viewing things in this manner is that your objects of study are entirely divorced from everyday reality. By an intepretation of probability we therefore mean some way of assigning real-world counterparts to the undefined terms in the axioms. For the interpretation to be successful we want our assignments of real-world meaning to non-defined terms to be done in such a way that the resulting theorems of the mathematical theory are turned into true statements about physical reality. Ideally, we would come up with one interpretation that covers all of our intuitive notions of probability, and that is applicable to various scientific problems of interest.
The trouble is that probability talk gets used in a large variety of seemingly different contexts. Indeed, I was led to think about this issue in the course of writing about the Monty Hall problem. I noticed that in the space of seventy pages or so I had casually used three different notions of probability, without even realizing immediately that I had done so.
The solution to the basic Monty Hall problem typically proceeds by enumerating the cases and finding the ratio of wins by switching to all plays of the game. This is essentially the classical view of things at work. (It’s not precisely the classical view, since in solving the Monty Hall problem you need to realize that not all scenarios have an equal probability, but it is close enough for this discussion.)
I then described the results of Monte Carlo simulations of the Monty Hall problem. This is the method that convinces pretty much everyone that switching is the way to go. The famous mathematician Paul Erdos refused to accept that there was an advantage to be gained from switching. He changed his mind immediately upon seeing the results of a simulation (though he still felt dissatisfied with the result). But the idea that a Monte Carlo simulation is telling you something important about the probability of winning by switching is based on a frequentist view of the situation.
Moving on, in other places it was natural to present the Monty Hall problem as a question in decision theory. This way of proceeding is especially natural in variations with many doors or exotic host behaviors. Suddenly I found myself writing about how the player ought to assess his chances of winning under various chances given the information at his disposal.
Just to be clear, I am not saying that you are logically compelled to adopt one school of thought over another in pondering different aspects of the Monty Hall problem. Only that different intepretations seem more natural depending on the aspect of the problem you are discussing.
Three different interpretations all rolled up into one problem. What hope, then, of finding a comprehensive interpretation of probability?
So which side am I on? Well, as a pure mathematician my attitude is that probability was doing just fine as an abstract theory, and if you find yourself running into conceptual difficulties when trying to understand it in real-world terms that just serves you right for trying to apply it to anything.
Kidding aside, I tend to follow the example of most mathematicians and take an ecumenical view of the whole thing. If different interpretations are useful in different contexts, then you go right ahead and use which ever interpretation you find most useful. Indeed, why is it even reasonable to expect that a single interpretation of probability will cover all of our intuitions about the subject?
As with so many of the things about which philosophers write, this debate strikes me as much ado about nothing. Very little, anyway. Having now read some of the literature on the subject, I can say that the arguments adduced for and against the various schools of thought are subtle, ingenious, and ultimately just not very important. Or so it seems to me.
The subject of statistical hypothesis testing does offer some instances where frequentists and Bayesians prescribe different ways of proceeding, So this can not be entirely dismissed as just an academic dispute. We’ll save that for a different post, however.