Here’s Timothy Gowers, a Fields Medalist, from his book Mathematics: A Very Short Intorduction:
However, there certainly are philosophers who take seriously the question of whether numbers exist, and this distinguishes them from mathematicians, who either find it obvious that numbers exist or do not understand what is being asked.
Everyone knows there is friction between scientists and philosophers of science. Richard Feynman spoke for many scientists when he quipped that, “Philosophy of science is as useful to scientists as ornithology is to birds.” From the other side, it is not uncommon for philosophers to lament the philosophical naivete of scientists (for example, in this recent book review.)
I am not aware of any similar tension between mathematicians and philosophers of mathematics, for the simple reason that I do not know any mathematicians who take any interest at all in the philosophy of their discipline. Perhaps this reflects badly on us as a community, but it is what it is. In my own case, every once in a while I get motivated to dip my toe into the philosophical literature, but it’s rare that I find myself enriched by the experience.
There have been exceptions, however. While writing the BMHB (that’s The Big Monty Hall Book) I found myself moved to read some of the literature about Interpretations of Probability. The reason was that in writing the book’s early chapters I found myself very casually making use of three different approaches to probability. In discussing the most elementary methods for solving the problem I used the classical interpretation, in which probabilities record the ratio of favorable outcomes to possible outcomes, assuming the possibilities are equiprobable. Later I discussed the use of Monte Carlo simulations to determine the correctness of our abstract reasoning, and this suggested a frequentist approach to probability. In this view a probability is something you measure from the data produced by multiple trials of some experiment. Later still I discussed matters from the perspective of a contestant actually playing the game. In this context it was convenient to take a Bayesian view of probability, in which a probability statement just records a person’s subjective degree of belief in some proposition.
The literature I found about interpreting probability was fascinating, and I certainly found plenty of food for thought. But for all of that I’m still not really sure what people are doing when they speak of interpreting probability. Probability theory is an abstract construction no different from anything else mathematicians study. No one talks about interpreting a perfect circle; instead we ask whether the idea of a perfect circle is useful in a given context. Frankly, as a pure mathematician I say that if you run into philosophical difficulties when applying the theory to a real-world situation, that just serves you right for trying to apply it to anything.
More seriously, the most important criterion for assessing any particular model of probability must surely be usefulness. That my Monty Hall experience led so naturally to three different interpretations suggests that no one interpretation can capture everything we have in mind when we use probability language. For that reason I tend to favor an ecumenical approach to probability: If your interpretation is helpful and leads to correct conclusions, then you just go right ahead and stick with it. The existence of other situations where your interpretation does not work so well is neither here nor there. Why should we even expect one interpretation to cover every facet of probability?
In perusing some of the literature on interpretations of probability, I noticed a bit of a cultural difference between defenders of rival schools of thought. In particular, Bayesians, to a greater degree than their rivals, really really care about this. They also tend to be a bit contemptuous of other approaches, especially the poor frequentists, who they regard with great pity. A case in point is this post by Ian Pollock, over at Rationally Speaking. He writes:
Stop me if you’ve heard this before: suppose I flip a coin, right now. I am not giving you any other information. What odds (or probability, if you prefer) do you assign that it will come up heads?
If you would happily say “Even” or “1 to 1” or “Fifty-fifty” or “probability 50%” — and you’re clear on WHY you would say this — then this post is not aimed at you, although it may pleasantly confirm your preexisting opinions as a Bayesian on probability. Bayesians, broadly, consider probability to be a measure of their state of knowledge about some proposition, so that different people with different knowledge may correctly quote different probabilities for the same proposition.
If you would say something along the lines of “The question is meaningless; probability only has meaning as the many-trials limit of frequency in a random experiment,” or perhaps “50%, but only given that a fair coin and fair flipping procedure is being used,” this post is aimed at you. I intend to try to talk you out of your Frequentist view; the view that probability exists out there and is an objective property of certain physical systems, which we humans, merely fallibly, measure.
My broader aim is therefore to argue that “chance” is always and everywhere subjective — a result of the limitations of minds — rather than objective in the sense of actually existing in the outside world.
It’s hard to see how this could be true. It is simply a fact that a great many physical systems produce outcomes with broadly predictable relative frequencies. A fair coin flipped in a fair way really does land heads about half the time and tails about half the time. The ball in an honest roulette wheel finds each number roughly one thirty-eighth of the time. Those are objective properties of those systems, and it seems perfectly reasonable to use probability language to discuss those objective properties.
So let’s see what Pollock has in mind:
The canonical example from every textbook is a coin flip that uses a fair coin and has a fair flipping procedure. “Fair coin” means, in effect, that the coin is not weighted or tampered with in such a way as to make it tend to land, say, tails. In this particular case, we can say a coin is fair if it is approximately cylindrical and has approximately uniform density. How about a fair flipping procedure? Well, suppose that I were to flip a coin such that it made only one rotation, then landed in my hand again. That would be an unfair flipping procedure. A fair flipping procedure is not like that, in the sense that it’s … unpredictable? Sure, let’s go with that. (Feel free to try to formalize that idea in a non question-begging way, if you wish.)
I don’t know what level of description Pollock wants here. If he would care to come to my office, I will simply show him what I mean by a fair flipping procedure. But he knows what I would show him, since it’s the same procedure everyone uses when they are not deliberately trying to cheat someone. The case of a roulette wheel is perhaps even clearer. By a fair procedure I mean, “The way it’s done in your classier casinos, you know, with the ball going in one direction and the wheel going in the other.”
Let’s move on:
Given these conditions, frequentists are usually comfortable talking about the probability of heads as being synonymous with the long-run frequency of heads, or sometimes the limit, as the number of trials approaches infinity, of the ratio of trials that come up heads to all trials. They are definitely not comfortable with talking about the probability of a single event — for example, the probability that Eugene will be late for work today. Will Feller said: “There is no place in our system for speculations concerning the probability that the sun will rise tomorrow. Before speaking of it we should have to agree on an (idealized) model which would presumably run along the lines ‘out of infinitely many worlds one is selected at random…’ Little imagination is required to construct such a model, but it appears both uninteresting and meaningless.”
The first, rather practical problem with this is that it excludes altogether many interesting questions to which the word “probability” would seem prima facie to apply. For example, I might wish to know the likelihood of a certain accident’s occurrance in an industrial process — an accident that has not occurred before. It seems that we are asking a real question when we ask how likely this is, and it seems we can reason about this likelihood mathematically. Why refuse to countenance that as a question of probability?
As it happens, I am among those who are uncomfortable with applying probability language to one-off situations. It’s fine to speak informally about the likelihood (or odds, or probability) of a one-off event, but if the idea is to assign actual numbers to events and then apply the formal theory of probability to them, then I no longer understand what you are doing. It’s unclear to me what it means to say, “Given the information I have I believe the probability of this one-off event is one-third,” unless we can view the event as one among a long sequence of trials.
Let’s consider Pollack’s examples. Informally I might say that, given what I know about Eugene, it’s highly likely that he will be late to work today. But it’s hard to imagine what it would mean to assign an actual number to the probability that Eugene will be late, unless we have long experience with Eugene’s habits on days that are comparable to this one. Likewise, I could make an informal assessment of how likely it is that an industrial accident will occur, but I don’t know how to assign an actual number to the probability of it occurring. Of course, we might look at a specific mechanical part used in the industrial process and say something like, “This part has been used in tens of thousands of industrial processes and empirically it fails roughly one time in five thousand…” Now I know what we’re talking about! But if we’re truly talking about a one-off event that is completely divorced from any possible long sequence of trials, then I just don’t know what it means to assign a probability to its occurrence.
The second, much deeper problem is as follows (going back to coin flipping as an example): the fairness (i.e., unpredictability) of the flipping procedure is subjective — it depends on the state of knowledge of the person assigning probabilities. Some magicians, for example, are able to exert pretty good control over the outcome of a coin toss with a fairly large number of rotations, if they so choose. Let us suppose, for the sake of argument, that the substance of their trick has something to do with whether the coin starts out heads or tails before the flip. If so, then somebody who knows the magicians’ trick may be able to predict the outcome of a coin flip I am performing with decent accuracy — perhaps not 100%, but maybe 55 or 60%. Suppose that a person versed in such tricks is watching me perform what I think is a fair flipping procedure. That person actually knows, with better than chance accuracy, the outcome of each flip. Is it still a “fair flipping procedure?”
I’m afraid I don’t see the problem. I certainly agree that a skillful magician can fool me into thinking he is using a fair procedure when he really isn’t. The fact remains that there are flipping procedures that produce stable relative frequencies of heads and tails. If I know you are using one of those, then I can make an objective statement about what will happen in a long-run of trials.
You might retort that I can never really know what procedure you’re using, and that is where the subjectivity comes in. But that same argument could be used against any claim to objective knowledge. It’s hardly a weakness unique to probability. Any fact you assert is inevitably based on a pile of assumptions about how the world is, and a determined skeptic could challenge you on any of those assumptions. But if we’re ever comfortable talking about objective knowledge, then I don’t see why, “A fair coin flipped in a fair way will land heads roughly half the time in a long sequence of trials,” should not be considered objective.
So it breaks down like this: It is an objective fact that certain physical systems produce outcomes with broadly stable relative frequencies. Probability theory is very useful for understanding such situations. Plainly, then, there is an objective aspect to probability. In practice I can be mistaken about certain facts that are relevant to making correct probability assignments. Thus, there is also a subjective aspect to probability. That is why, depending on the situation, it might be useful to think of probability in terms of the objective properties of physical systems, or in terms of our subjective knowledge of what is taking place.
This problem is made even clearer by indulging in a little bit of thought experimentation. In principle, no matter how complicated I make the flipping procedure, a godlike Laplacian Calculator who sees every particle in the universe and can compute their past, present and future trajectories will always be able to predict the outcome of every coin flip with probability ~1. To such an entity, a “fair flipping procedure” is ridiculous — just compute the trajectories and you know the outcome!
Generalizing away from the coin flipping example, we can see that so-called “random experiments” are always less random for some agents than for others (and at a bare minimum, they are not random at all for the Laplacian Calculator), which undermines the supposedly objective basis of frequentism.
I disagree. That a godlike Laplacian Calculator can perfectly predict the outcome of any coin toss has no relevance at all to the objective basis of frequentism. The thing that’s objective is the stable long-run frequency, not the outcome of any one toss. Our godlike Calculator will presumably predict that heads will occur half the time in a long sequence of trials.
Pollack goes on to discuss quantum mechanics and chaos theory, but I won’t discuss that part of his post.
The three interpretations of probability I have mentioned are clearly related to one another. The classical interpretation defines probability without any reference to long runs of trials, but the ratio you compute is understood to represent a prediction about what will happen in the long run. And Bayesians don’t think that long run data is irrelevant to probabilistic reasoning. They just treat that data as new information they use to update a prior probability distribution. And no one would deny that our judgements about how likely things are to happen in the future depends on the information we have in the present.
Given that different interpretations are plainly useful in different contexts, I don’t understand the mania for trying to squeeze everything about probability into just one interpretation. You have lost something important by declaring that any probability assignment is purely subjective. Let’s not forget that probability was invented in the context of games of chance, and in that context it developed models that permit fairly detailed predictions about long-run frequencies. That I can never be absolutely certain, in a given situation, that my model applies does not imply that all probability statements are purely subjective.