Interpretations of Probability

Here’s Timothy Gowers, a Fields Medalist, from his book Mathematics: A Very Short Intorduction:

However, there certainly are philosophers who take seriously the question of whether numbers exist, and this distinguishes them from mathematicians, who either find it obvious that numbers exist or do not understand what is being asked.

Everyone knows there is friction between scientists and philosophers of science. Richard Feynman spoke for many scientists when he quipped that, “Philosophy of science is as useful to scientists as ornithology is to birds.” From the other side, it is not uncommon for philosophers to lament the philosophical naivete of scientists (for example, in this recent book review.)

I am not aware of any similar tension between mathematicians and philosophers of mathematics, for the simple reason that I do not know any mathematicians who take any interest at all in the philosophy of their discipline. Perhaps this reflects badly on us as a community, but it is what it is. In my own case, every once in a while I get motivated to dip my toe into the philosophical literature, but it’s rare that I find myself enriched by the experience.

There have been exceptions, however. While writing the BMHB (that’s The Big Monty Hall Book) I found myself moved to read some of the literature about Interpretations of Probability. The reason was that in writing the book’s early chapters I found myself very casually making use of three different approaches to probability. In discussing the most elementary methods for solving the problem I used the classical interpretation, in which probabilities record the ratio of favorable outcomes to possible outcomes, assuming the possibilities are equiprobable. Later I discussed the use of Monte Carlo simulations to determine the correctness of our abstract reasoning, and this suggested a frequentist approach to probability. In this view a probability is something you measure from the data produced by multiple trials of some experiment. Later still I discussed matters from the perspective of a contestant actually playing the game. In this context it was convenient to take a Bayesian view of probability, in which a probability statement just records a person’s subjective degree of belief in some proposition.

The literature I found about interpreting probability was fascinating, and I certainly found plenty of food for thought. But for all of that I’m still not really sure what people are doing when they speak of interpreting probability. Probability theory is an abstract construction no different from anything else mathematicians study. No one talks about interpreting a perfect circle; instead we ask whether the idea of a perfect circle is useful in a given context. Frankly, as a pure mathematician I say that if you run into philosophical difficulties when applying the theory to a real-world situation, that just serves you right for trying to apply it to anything.

More seriously, the most important criterion for assessing any particular model of probability must surely be usefulness. That my Monty Hall experience led so naturally to three different interpretations suggests that no one interpretation can capture everything we have in mind when we use probability language. For that reason I tend to favor an ecumenical approach to probability: If your interpretation is helpful and leads to correct conclusions, then you just go right ahead and stick with it. The existence of other situations where your interpretation does not work so well is neither here nor there. Why should we even expect one interpretation to cover every facet of probability?

In perusing some of the literature on interpretations of probability, I noticed a bit of a cultural difference between defenders of rival schools of thought. In particular, Bayesians, to a greater degree than their rivals, really really care about this. They also tend to be a bit contemptuous of other approaches, especially the poor frequentists, who they regard with great pity. A case in point is this post by Ian Pollock, over at Rationally Speaking. He writes:

Stop me if you’ve heard this before: suppose I flip a coin, right now. I am not giving you any other information. What odds (or probability, if you prefer) do you assign that it will come up heads?

If you would happily say “Even” or “1 to 1” or “Fifty-fifty” or “probability 50%” — and you’re clear on WHY you would say this — then this post is not aimed at you, although it may pleasantly confirm your preexisting opinions as a Bayesian on probability. Bayesians, broadly, consider probability to be a measure of their state of knowledge about some proposition, so that different people with different knowledge may correctly quote different probabilities for the same proposition.

If you would say something along the lines of “The question is meaningless; probability only has meaning as the many-trials limit of frequency in a random experiment,” or perhaps “50%, but only given that a fair coin and fair flipping procedure is being used,” this post is aimed at you. I intend to try to talk you out of your Frequentist view; the view that probability exists out there and is an objective property of certain physical systems, which we humans, merely fallibly, measure.

My broader aim is therefore to argue that “chance” is always and everywhere subjective — a result of the limitations of minds — rather than objective in the sense of actually existing in the outside world.

It’s hard to see how this could be true. It is simply a fact that a great many physical systems produce outcomes with broadly predictable relative frequencies. A fair coin flipped in a fair way really does land heads about half the time and tails about half the time. The ball in an honest roulette wheel finds each number roughly one thirty-eighth of the time. Those are objective properties of those systems, and it seems perfectly reasonable to use probability language to discuss those objective properties.

So let’s see what Pollock has in mind:

The canonical example from every textbook is a coin flip that uses a fair coin and has a fair flipping procedure. “Fair coin” means, in effect, that the coin is not weighted or tampered with in such a way as to make it tend to land, say, tails. In this particular case, we can say a coin is fair if it is approximately cylindrical and has approximately uniform density. 

How about a fair flipping procedure? Well, suppose that I were to flip a coin such that it made only one rotation, then landed in my hand again. That would be an unfair flipping procedure. A fair flipping procedure is not like that, in the sense that it’s … unpredictable? Sure, let’s go with that. (Feel free to try to formalize that idea in a non question-begging way, if you wish.)

I don’t know what level of description Pollock wants here. If he would care to come to my office, I will simply show him what I mean by a fair flipping procedure. But he knows what I would show him, since it’s the same procedure everyone uses when they are not deliberately trying to cheat someone. The case of a roulette wheel is perhaps even clearer. By a fair procedure I mean, “The way it’s done in your classier casinos, you know, with the ball going in one direction and the wheel going in the other.”

Let’s move on:

Given these conditions, frequentists are usually comfortable talking about the probability of heads as being synonymous with the long-run frequency of heads, or sometimes the limit, as the number of trials approaches infinity, of the ratio of trials that come up heads to all trials. They are definitely not comfortable with talking about the probability of a single event — for example, the probability that Eugene will be late for work today. Will Feller said: “There is no place in our system for speculations concerning the probability that the sun will rise tomorrow. Before speaking of it we should have to agree on an (idealized) model which would presumably run along the lines ‘out of infinitely many worlds one is selected at random…’ Little imagination is required to construct such a model, but it appears both uninteresting and meaningless.”

The first, rather practical problem with this is that it excludes altogether many interesting questions to which the word “probability” would seem prima facie to apply. For example, I might wish to know the likelihood of a certain accident’s occurrance in an industrial process — an accident that has not occurred before. It seems that we are asking a real question when we ask how likely this is, and it seems we can reason about this likelihood mathematically. Why refuse to countenance that as a question of probability?

As it happens, I am among those who are uncomfortable with applying probability language to one-off situations. It’s fine to speak informally about the likelihood (or odds, or probability) of a one-off event, but if the idea is to assign actual numbers to events and then apply the formal theory of probability to them, then I no longer understand what you are doing. It’s unclear to me what it means to say, “Given the information I have I believe the probability of this one-off event is one-third,” unless we can view the event as one among a long sequence of trials.

Let’s consider Pollack’s examples. Informally I might say that, given what I know about Eugene, it’s highly likely that he will be late to work today. But it’s hard to imagine what it would mean to assign an actual number to the probability that Eugene will be late, unless we have long experience with Eugene’s habits on days that are comparable to this one. Likewise, I could make an informal assessment of how likely it is that an industrial accident will occur, but I don’t know how to assign an actual number to the probability of it occurring. Of course, we might look at a specific mechanical part used in the industrial process and say something like, “This part has been used in tens of thousands of industrial processes and empirically it fails roughly one time in five thousand…” Now I know what we’re talking about! But if we’re truly talking about a one-off event that is completely divorced from any possible long sequence of trials, then I just don’t know what it means to assign a probability to its occurrence.

Moving on:

The second, much deeper problem is as follows (going back to coin flipping as an example): the fairness (i.e., unpredictability) of the flipping procedure is subjective — it depends on the state of knowledge of the person assigning probabilities. Some magicians, for example, are able to exert pretty good control over the outcome of a coin toss with a fairly large number of rotations, if they so choose. Let us suppose, for the sake of argument, that the substance of their trick has something to do with whether the coin starts out heads or tails before the flip. If so, then somebody who knows the magicians’ trick may be able to predict the outcome of a coin flip I am performing with decent accuracy — perhaps not 100%, but maybe 55 or 60%. Suppose that a person versed in such tricks is watching me perform what I think is a fair flipping procedure. That person actually knows, with better than chance accuracy, the outcome of each flip. Is it still a “fair flipping procedure?”

I’m afraid I don’t see the problem. I certainly agree that a skillful magician can fool me into thinking he is using a fair procedure when he really isn’t. The fact remains that there are flipping procedures that produce stable relative frequencies of heads and tails. If I know you are using one of those, then I can make an objective statement about what will happen in a long-run of trials.

You might retort that I can never really know what procedure you’re using, and that is where the subjectivity comes in. But that same argument could be used against any claim to objective knowledge. It’s hardly a weakness unique to probability. Any fact you assert is inevitably based on a pile of assumptions about how the world is, and a determined skeptic could challenge you on any of those assumptions. But if we’re ever comfortable talking about objective knowledge, then I don’t see why, “A fair coin flipped in a fair way will land heads roughly half the time in a long sequence of trials,” should not be considered objective.

So it breaks down like this: It is an objective fact that certain physical systems produce outcomes with broadly stable relative frequencies. Probability theory is very useful for understanding such situations. Plainly, then, there is an objective aspect to probability. In practice I can be mistaken about certain facts that are relevant to making correct probability assignments. Thus, there is also a subjective aspect to probability. That is why, depending on the situation, it might be useful to think of probability in terms of the objective properties of physical systems, or in terms of our subjective knowledge of what is taking place.

This problem is made even clearer by indulging in a little bit of thought experimentation. In principle, no matter how complicated I make the flipping procedure, a godlike Laplacian Calculator who sees every particle in the universe and can compute their past, present and future trajectories will always be able to predict the outcome of every coin flip with probability ~1. To such an entity, a “fair flipping procedure” is ridiculous — just compute the trajectories and you know the outcome!

Generalizing away from the coin flipping example, we can see that so-called “random experiments” are always less random for some agents than for others (and at a bare minimum, they are not random at all for the Laplacian Calculator), which undermines the supposedly objective basis of frequentism.

I disagree. That a godlike Laplacian Calculator can perfectly predict the outcome of any coin toss has no relevance at all to the objective basis of frequentism. The thing that’s objective is the stable long-run frequency, not the outcome of any one toss. Our godlike Calculator will presumably predict that heads will occur half the time in a long sequence of trials.

Pollack goes on to discuss quantum mechanics and chaos theory, but I won’t discuss that part of his post.

The three interpretations of probability I have mentioned are clearly related to one another. The classical interpretation defines probability without any reference to long runs of trials, but the ratio you compute is understood to represent a prediction about what will happen in the long run. And Bayesians don’t think that long run data is irrelevant to probabilistic reasoning. They just treat that data as new information they use to update a prior probability distribution. And no one would deny that our judgements about how likely things are to happen in the future depends on the information we have in the present.

Given that different interpretations are plainly useful in different contexts, I don’t understand the mania for trying to squeeze everything about probability into just one interpretation. You have lost something important by declaring that any probability assignment is purely subjective. Let’s not forget that probability was invented in the context of games of chance, and in that context it developed models that permit fairly detailed predictions about long-run frequencies. That I can never be absolutely certain, in a given situation, that my model applies does not imply that all probability statements are purely subjective.

Comments

  1. #1 lylebot
    October 11, 2011

    It’s hard to see how this could be true. It is simply a fact that a great many physical systems produce outcomes with broadly predictable relative frequencies. A fair coin flipped in a fair way really does land heads about half the time and tails about half the time

    But in principle you could measure all forces acting on the coin and predict with near 100% certainty whether it will come up heads or tails, right? It’s not random in the same way that measuring the spin of an electron is.

    In practice we abstract those forces away and model the coin as a random variable with 50% probability of coming up heads. But that’s just a model, and in some sense it is subjective: we have made the decision to abstract certain forces that we could have measured had we wanted to, and that decision is a subjective one that we could have made differently. No?

    Anyway, Andrew Gelman is a political scientist/statistician/Bayesian/blogger that you might enjoy reading on this topic. He’s a Bayesian but he doesn’t commit to the notion of “subjectivity” that so many Bayesians do, and he sees value in frequentist interpretations as well. Here’s a link to a recent essay:
    http://www.rmm-journal.de/downloads/Article_Gelman.pdf

  2. #2 lylebot
    October 11, 2011

    Just to add to my previous post… the great statistician C. R. Rao could supposedly flip a coin so that it always came up heads. If that’s true, it’s a pretty great demonstration of Box’s famous quote “all models are wrong, but some are useful”.

  3. #3 Sean Santos
    October 11, 2011

    Hmm. The one thing I will say is that in order to meaningfully assign probability to some event, you have to stipulate what knowledge or information about the event you are or are not taking into account. From a Bayesian perspective, this is part of the “subjective” viewpoint that you have to take. From a frequentist perspective, this is done by labeling the event as a member of a specific class. From a “classical” perspective, this might be done by listing possible outcomes (and then stating that all outcomes are equally likely, for all you know).

    This all potentially leads to the same result. A random person, viewing a coin toss as a “typical” or “fair” coin toss, from the class of fair coin tosses, will give the probability of getting heads as 50%. Laplace’s demon, viewing a coin toss as a deterministic event from a class consisting only of tosses physically identical to that one flip, might give the probability of getting heads as 100%. A skilled predictor, viewing a coin toss as a moderately biased toss from a class of tosses that are lopsided to about the same degree, might give the probability of getting heads as 75%.

    In each case the outcome might be “ultimately” or ontologically determined from the start. But there are legitimate yet differing estimates due to the amount of information available or being taken into account. None of the three actors listed is necessarily “wrong” in their deductions, even though they have different conclusions, which is where the claims about subjectivity come in.

    Each probability may be objectively correct given certain information or a certain reference class, but “the probability that this next coin flip will come up heads” is not an objective quantity with a unique value. I would agree, however, that you can objectively assign a value of 50% to “the probability that this coin will come up heads given that it is flipped fairly and that we take into account no other information about the way in which it is flipped”. You have to exclude any further relevant information from your consideration, even if only implicitly.

  4. #4 Stephen Lucas
    October 11, 2011

    When discussing philosophy of mathematics, I am reminded of the quote: “All a mathematician needs to work is a pen, paper, and a waste basket. A philosopher doesn’t need the waste basket.” once had a reason to try and understand the philosophy of mathematics (the invented versus discovered issue), and after plowing through various texts over a period years, knew enough to find holes in the arguments for invented, and decided it wasn’t worth wasting my time on any further.

    I’m not going to disagree with your broader thesis, which I thoroughly agree with. However, a minor point. Be careful about claims of applying pure mathematics to the real world being worthless. Pretty much all of pure mathematics has turned out to be useful eventually, including your own areas of number theory and graph theory. Virtually all of number theory (the purest of the pure) turns out to be enormously practical and important in cryptography. Graph theory started out pure, but now is useful in huge numbers of practical areas from operations research to network theory. And I’d say your credentials as a pure mathematician are severely battered by your excellent work on the Monty Hall Problem. Understanding human frailties with respect to conditional probabilities is why casinos and insurance companies do so well. That is positively applied!

  5. #5 Jason Rosenhouse
    October 11, 2011

    Oh, lighten up Steve! I certainly do not think that applied mathematics is worthless. I’m actually a big fan of applied math, just so long as it’s someone else who’s doing it.

    And since you were kind enough to compliment my work on the Monty Hall problem, let me return the favor by informing everyone that some of the Monte Carlo simulations I referred to in the post were actually programmed and run by you. Thanks!

  6. #6 D. C. Sessions
    October 11, 2011

    However, there certainly are philosophers who take seriously the question of whether numbers exist

    Which statement leads me to wonder whether there are also philosophers who take seriously the question of whether words exist — and if not, why not?

  7. #7 Michael Fisher
    October 11, 2011

    Hi Jason ~ yet another entertaining & interesting post ~ thank you

    A small detail that doesn’t effect your general argument, but perhaps of interest:

    …The ball in an honest roulette wheel finds each number roughly one thirty-eighth of the time. Those are objective properties of those systems, and it seems perfectly reasonable to use probability language to discuss those objective properties

    The European roulette wheel lacks the “00” slot

    The house edge on a single number bet on the American wheel with 38 different numbers is 2/38 – or 5.26% rake
    The house edge on the European wheel is only 1/37 – or 2.70% rake
    When the odds shift so much, you should always be looking for the single zero European roulette, or even better play a skilled game against less skilled players with only house fees as a rake e.g. poker

  8. #8 Dr. I. Needtob Athe
    October 11, 2011

    You’re sitting at a card table with Andy, Bill, and Carl, all of whom are good at math. You take the four aces out of a deck of cards and explain that the Ace of Hearts and the Ace of Diamonds are defined as red cards, while the Ace of Spades and Ace of Clubs are defined as black cards.

    You set the rest of the deck aside and ask, “What is the probability that a card drawn at random from these four cards will be red?”

    Everyone agrees that the probability is 1/2.

    You shuffle the cards in such a thorough way that a card selected from them will truly be drawn at random, then you draw a card without looking and place it face down in the center of the table.

    You hand out sheets of paper and ask each person to secretly write his name and the probability that the card in the center of the table is red, then fold his paper and give it to you.

    Unsurprisingly, all three papers say 1/2.

    From the three remaining aces, you give one card each to Andy, Bill, and Carl, keeping the cards face down and instructing them to each look at their own card but hide it from everyone else.

    Again, you hand out sheets of paper and ask each person to secretly write his name and the probability that the card in the center of the table is red, then fold his paper and give it to you.

    This time the results are different. Andy’s paper says 1/3, but Bill’s and Carl’s papers both say 2/3.

    My question is, who is right and who is wrong? Also, if anyone is right, does that mean that his first answer of 1/2 was wrong?

    My point is that when you speak of the probability that a certain card has a certain property, you’re speaking about a property of the card, and if two people give two different answers about that property, they can’t both be correct.

    This seems like a paradox to me. How can it be resolved?

  9. #9 Deepak Shetty
    October 11, 2011

    I don’t know what level of description Pollock wants here.
    I think he means that if you flip the coin exactly the same way it will always land the same side. And if you change something that it isn’t exactly “fair” and if you knew enough you could predict the outcome.

  10. #10 Dr. I. Needtob Athe
    October 11, 2011

    I have another question about mathematics:

    Most of you are probably familiar with Bruce Schneier, the prominent cryptographer. I’ve been reading his Crypto-Gram and his blog for many years and I’ve noticed that he occasionally refers to himself as a scientist.

    I’m not trying to deny him that status, but does cryptography fit the definition of a science? Does it fit the definition of a natural science? Do cryptographers study some aspect of nature? If so, what aspect of nature do they study?

  11. #11 eric
    October 11, 2011

    This time the results are different. Andy’s paper says 1/3, but Bill’s and Carl’s papers both say 2/3.

    My question is, who is right and who is wrong?

    All of them are wrong; the probability of it being red is 100%.

    (Assuming none of them lied and all of them know how to calculate probability).

  12. #12 Pierce R. Butler
    October 11, 2011

    “Philosophy of science is as useful to scientists as ornithology is to birds.”

    Apparently Dr. Feynman never asked Audubon Society members about their contributions to habitat preservation.

  13. #13 jt512
    October 12, 2011

    Jason, When Ian first made this blog post, I tried posting a similar critique as a comment on his blog, but couldn’t, apparently due to a software glitch of some sort. I agree with you about everything, except possibly that it is inappropriate to assign a probability to a “one-off” event if we employ the Bayesian interpretation that probability represents one’s degree of belief in a hypothesis. For example, what’s the probability that ESP is real? IMO, on the order of 10^(-12).

    Jay

  14. #14 Patrick
    October 12, 2011

    This may be a stupid comment, since my math education, while fairly significant, did end in undergraduate.

    But surely there’s a difference between saying that a process yields a stable relative frequency, and saying that a process is random in some objective sense? A table of random integers yields a stable relative frequency, I think, but if I’ve got the table in front of me I can know with 100% certainty that the next integer is a 4.

    Doesn’t the Bayesian point become a lot clearer if you change analogies about coin flips to something that’s clearly and obviously predetermined, but also “random” in certain senses of the term? That would at least eliminate intuitive assumptions people make about the nature of games of chance.

  15. #15 Sascha Vongehr
    October 12, 2011

    “a great many physical systems produce outcomes with broadly predictable relative frequencies.”
    Bayesians may be contemptuous because once you do QM physics, the frequentist approach is as circular as the classical interpretation, or in other words, there is an infinite regress. I tried to explain one aspect of this in the article “Empirical Probability versus Classical Fair Meta-Randomness”:
    http://www.science20.com/alpha_meme/empirical_probability_versus_classical_fair_metarandomness-81195

  16. #16 Peter
    October 12, 2011

    Just because a model doesn’t work on a quantum level doesn’t mean that the model is wrong elsewhere, or inappropriate, or broken, it just means the model doesn’t work on a quantum level.

  17. #17 bobh
    October 12, 2011

    So Jason you say “every once in a while I get motivated to dip my toe into the philosophical literature, but it’s rare that I find myself enriched by the experience.” Then you launch in to a post on philosophical interpretations of probabilities demonstrating the truth of your statement.

  18. #18 darkgently
    October 12, 2011

    - “I’m actually a big fan of applied math, just so long as it’s someone else who’s doing it.”

    As a pure mathematician sharing an office with applied mathematicians, I am totally on board with this! :)

  19. #19 Collin
    October 12, 2011

    @8 & @11: The probability of a card already placed can only be meaningful in the frequentist sense. The problem is that it isn’t established beforehand what details of the experiment are necessary to place it among the set of trials. Andy, Bill, and Carl each assume the right to keep their card while the others must redraw. It is this inconsistency that causes the paradox. If it were made plain that they all had to return their card after reporting the probability, they would’ve said 1/2.

  20. #20 J. Quinton
    October 12, 2011

    This post looks like it would fit in well with the community at Less Wrong.

  21. #21 Dr. I. Needtob Athe
    October 12, 2011

    Collin (Post #19): “Andy, Bill, and Carl each assume the right to keep their card while the others must redraw.”

    Nobody will redraw. All the drawing is finished and we’re talking about the color of the card laying face-down in the center of the table.

    I don’t think you understood what I wrote. Please read it again.

  22. #22 phayes
    October 12, 2011

    “That a godlike Laplacian Calculator can perfectly predict the outcome of any coin toss has no relevance at all to the objective basis of frequentism. The thing that’s objective is the stable long-run frequency, not the outcome of any one toss. Our godlike Calculator will presumably predict that heads will occur half the time in a long sequence of trials.”

    Yes, well… Pollack’s post looks very much like a sketchy attempt at an interpretation of bits of Jaynes and if that is where it’s coming from it has made an easily dismissable mess of the argument(s) pointing out what’s wrong with the idea of an ‘objective basis of frequentism’ there. Best read Jaynes’s book. :)

  23. #23 Marshall
    October 12, 2011

    You don’t understand the mania for squeezing everything into just one interpretation, and I agree totally. Understanding benefits from looking at things from various sides, from angles as distinct as possible. Logical parallax. Reification. Substitution of variables. And so on. No “one size fits all”.

    The ‘fairness’ of the single-trial flipping procedure lies in making conditions uncontrollable or unobservable. One point of a good experiment is to have initial conditions well controlled, so one imagines that if you thoroughly controlled your flipping procedure, the outcome would indeed be repeatably determinate. Is it possible to define a selection procedure that is truly random (P in the normal sense) rather than merely unknown (Bayseian)? I suppose the Modernist answer is “no”.

    Not that that makes any practical difference. Flipping a coin once to see who pays for dinner is “fair”, in the sense that I would have no qualms about submitting to such a procedure. Other things being equal. I think that can be mashed into a good philosophical argument. Was that your point?

    (On the side, I think it’s interesting that it’s so much easier to describe a fair coin, than a fair procedure for using it.)

    … I don’t think it’s surprising that scientists don’t necessarily get much out of philosophy of science. Doers of X would not ordinarily be the best observers of X, or the most interested. (I once teased my son, then 4 or 5, by reading to him from our child-rearing handbook, explaining to him how his behavior reflected what it said, and what it predicted for his future behavior. He was indeed quite annoyed.) Perhaps the best consumers of P of S would be those of us, not practicing scientists, who want to understand what scientists are doing and how best to incorporate that into our lives. Where are the values, where are the blind spots.

  24. #24 One Brow
    October 12, 2011

    My question is, who is right and who is wrong? Also, if anyone is right, does that mean that his first answer of 1/2 was wrong?

    They are all correct. They were also all correct when they said 1/2.

    My point is that when you speak of the probability that a certain card has a certain property, you’re speaking about a property of the card, and if two people give two different answers about that property, they can’t both be correct.

    The probability is not a property of the card. I’m a frequentist (to the degree a choice must be made at all), and see the probability is a feature of a long-term trend of identical situations, not of a single event.

    This seems like a paradox to me. How can it be resolved?

    By recognizing you are asking four different questions, which among them have three different, correct, answers.

    Before the three cards are dealt:
    1) Based what you know of the card distribution, what is the probability that this card is red?

    After the cards are dealt:
    2a) Andy, based what you know of the card distribution, what is the probability that this card is red?
    2b) Bill, based what you know of the card distribution, what is the probability that this card is red?
    2c) Carl based what you know of the card distribution, what is the probability that this card is red?

    The different intial knowledge means different, correct answers.

  25. #25 G.D.
    October 12, 2011

    “As it happens, I am among those who are uncomfortable with applying probability language to one-off situations. It’s fine to speak informally about the likelihood (or odds, or probability) of a one-off event, but if the idea is to assign actual numbers to events and then apply the formal theory of probability to them, then I no longer understand what you are doing.”

    This actually makes little or no sense, and you don’t think it is true when you think about it. I mean, you wrote a whole book on Monty Hall, didn’t you? Certainly the point about knowing how to assess probabilities in a Monty Hall case is to be able to assess your chances of winning by changing doors or not? So suppose I am at that stage in the program where I choose whether to switch doors or not. I assume you would want to say that I should (rationally) switch. But how could you say that? After all, me standing in front of the doors faced with this decision is a one-off event if anything is. Still, your recommendation only makes sense if you think it makes sense to apply probabilities to the situation. And of course it makes sense. If I stick with my door, I have a 1/3 chance of winning; if I switch I have a 2/3 chance. These are precise probabilities, applied to a specific, one-off event (and the problem with the accident case is simply that it is much more complex, and there is a far greater number of unknowns; it is not a difference in kind).

    But here’s the thing that raises the question of subjective vs. objective probabilities (unfortunate terms, I know). In the Monty Hall scenario the door where the prize is hidden is already determined. Either it is behind door A or door C (say). That means that the chance that it is behind door A is either 1 or 0 (not 1/3) and the chance that it is behind C is either 1 or 0 (not 2/3). That’s what it is supposed to mean when one says that the probabilities are not objective. There are no objective probabilities around in this case at all. And citing frequency is irrelevant, for the frequency of winning doesn’t change the fact that the chance that the prize is behind door A is either 1 or 0 (though I don’t know which). It doesn’t become 1/3.

    Rather, what the probabilities apply to are not the objective chances (there are none; the chance of the prize being behind door A is either 1 or 0), but my lack of information. The point is that given what I know, I should assign a 1/3 credence to “the prize is behind A” and a 2/3 credence to “the prize is behind C”. The credence doesn’t reflect any objective chance, since the objective chance is 0 (or 1 – I don’t know which).

    You are right that frequencies provide a standard. It’s the frequencies that allow me to determine which credence to assign to which hypothesis. That doesn’t alter the fact that the probability assignment in the one-off situation at hand reflects my information, not any objective fact about the situation at hand (since this is a one-off event where the probability for each option is either 0 or 1).

    What is unfortunate is the use of “objective” and “subjective”. The point is really the question of whether probabilities are “epistemic” (reflecting our information; the world may, for all we know, be completely deterministic) or “metaphysical” (the world is genuinely indeterminate) – but in the Monty Hall case the situation isn’t genuinely indeterminate; it is already determined which door hides the prize, so it means that the epistemic approach is the only one that makes sense in this case. But of course (well, presumably) it is an objective fact (in the more ordinary sense of “objective”) which credence a rational agent ought to assign – if you’re rational you wouldn’t just choose any random number – and frequencies can provide you with a very much objective reason why you should assign a 1/3 credence to “the prize is behind A” and a 2/3 credence to “the prize is behind C”.

    The more general point is that “the probability of X” cannot just mean “the frequency of X”, since a lot of events to which it makes sense to assign probabilities don’t really have any (relevant) frequencies behind them. Frequencies certainly help determining which credence you should assign (which is why frequencies are important also for Bayesians), but since each particular event making up a frequency may be completely deterministic, we have an argument that probability assignments really reflect our information states rather than any sort of “real indeterminacy” in the world.

  26. #26 Jr
    October 12, 2011

    I agree that we should not assume that there is one correct interpretation of probability, anymore then there is one correct interpretation of linear algebra.

    I however find your discussion of philosophy somewhat lacking.

    I think it is very difficult to define probability using relative frequency without assuming we already know what probability is.

    Taking the limit of infinitely many trials is impossible in the real world so we can hardly use that as a definition of probability. (Or at least, if we do probability becomes completely unmeasurable.) But if we perform a large number of finite trials we cannot say what the relative frequency will be. All we can say is that with high probability it will be close to the true probabilities. But of course now we have a circular definition of probability.

  27. #27 eric
    October 12, 2011

    Jr: Taking the limit of infinitely many trials is impossible in the real world so we can hardly use that as a definition of probability

    Why not? Mathematicians take the limit of infinitely many elements all the time, even though counting them would be impossible in the real world. I don’t think the criteria “can do it physically with real objects” is any more a legitimate criteria for probabilities than it is for, say, calculus. When was the last time you cut out the area under a (e.g., paper) curve into infinite, infinitely thin sections, and measured their individual areas?

    Jason’s point is very well taken and parallel to the natural science approach: its perfectly fine to keep multiple models around to attack a problem, since each of them will have different limitations / assumptions / boundary conditions. There is no need to philosophically dedicate oneself to any one of them. You can, if you like, but it isn’t required to be a good scientist (or mathematician). If one of them eventually turns out to be far more useful or “better” than the others, so be it, and the people who put their eggs in that basket early can crow about it (sorry about the mixed metaphor). But there’s no need to try and force the working community to make that decision prematurely.

  28. #28 G.D.
    October 12, 2011

    Eric:

    “Why not? Mathematicians take the limit of infinitely many elements all the time, even though counting them would be impossible in the real world. I don’t think the criteria “can do it physically with real objects” is any more a legitimate criteria for probabilities than it is for, say, calculus.”

    I think you misunderstand the point. When we say that X has a 50% chance of happening we don’t mean that in earlier cases X has occurred 50% of the time and not occurred the other 50% of the time. There generally isn’t any good correspondence between *actual* frequencies and the probabilities we assign (i.e. when we say that a coin has a 50% of landing heads, that doesn’t mean “coins have landed heads 50% of the times they have been flipped in the past”. Many of the events to which we assign probabilities haven’t occurred often enough for us to do it this way. So, one solution would be to say that “this coin has a 50% chance of landing heads if flipped” should be understood in terms of “if a coin was flipped an infinite number of times it would land heads”. The problem Jr points out, as I read it, is that this gets the cart before the horse. It cannot be what we mean by “50% probability”, since in order to make sense of “if this coin was flipped an infinite number of times it would land heads” we *already* have to assume that the coin has a 50% chance of landing heads. Frequencies are evidence for probabilities, and probabilities explain frequencies, not vice versa, as frequentists would claim (that’s what it means to be a frequentist). The Bayesian account is one (the best) way of making sense of what probabilities are when they cannot be frequencies.

    Indeed, I totally agree that the criteria “can do it physically with real objects” is a legitimate criteria for probabilities. So much the worse for the frequentist approach

    (dunno what happened to my last, long post explaining what is really at stake here and why the Bayesian, epistemic approach is obviously correct (and why “subjective” in “subjective probabilities” is a very unfortunate and misleading choice of words since there doesn’t have to be anything about them that are not objectively true).

  29. #29 eric
    October 12, 2011

    G.D.: The problem Jr points out, as I read it, is that this gets the cart before the horse. It cannot be what we mean by “50% probability”, since in order to make sense of “if this coin was flipped an infinite number of times it would land heads” we *already* have to assume that the coin has a 50% chance of landing heads.

    I don’t see that as circular reasoning so much as stating the same premise in two different ways. “P = kT and and “If we were to raise the temperature on a gas in a sealed container, the pressure would go up” are not circular, they are two different methods of saying the same thing.

  30. #30 Jason Rosenhouse
    October 12, 2011

    G. D. —

    Your comment got sent to moderation. Sorry about that. I have now posted it. I don’t think we’re on the same page, though. For example, you said:

    This actually makes little or no sense, and you don’t think it is true when you think about it. I mean, you wrote a whole book on Monty Hall, didn’t you? Certainly the point about knowing how to assess probabilities in a Monty Hall case is to be able to assess your chances of winning by changing doors or not? So suppose I am at that stage in the program where I choose whether to switch doors or not. I assume you would want to say that I should (rationally) switch. But how could you say that? After all, me standing in front of the doors faced with this decision is a one-off event if anything is. (Emphasis Added)

    No, that’s not at all a one-off event, at least not in the relevant sense. It makes sense to say that you will win with probability 2/3 by switching precisely because you can imagine playing the game multiple times, and the information you have justifies some conclusions about what will happen in the long run.

    I simply don’t understand what it means “to assign a 1/3 credence” to something. That sounds like gibberish to me. If your credence is simply based on what you know about long-run frequencies, then why talk about credences at all? Why not just talk about frequencies and be done with it?

    Jr —

    I think it is very difficult to define probability using relative frequency without assuming we already know what probability is.

    But I didn’t define probability using relative frequencies. I didn’t define probability at all. As far as I know, the only definition of probability that makes any sense is the one given by the abstract, axiomatic treatment you would find in a high-level mathematics textbook.

    The issue here is interpreting what probability means in different real-world contexts. And it just seems pretty obvious that different interpretations are helpful in different contexts. And since it’s an objective fact that certain physical systems produce stable long-run frequencies, I don’t see how one can argue, as Pollack does, that chance is always and everywhere subjective. For what it’s worth, here’s the conclusion of the article on this subject in the Stanford Encyclopedia of Philosophy (the article was written by Alan Hajek):

    It should be clear from the foregoing that there is still much work to be done regarding the interpretation of probability. Each interpretation that we have canvassed seems to capture some crucial insight into it, yet falls short of doing complete justice to it. Perhaps the full story about probability is something of a patchwork, with partially overlapping pieces. In that sense, the above interpretations might be regarded as complementary, although to be sure each may need some further refinement. My bet, for what it is worth, is that we will retain at least three distinct notions of probability: one quasi-logical, one objective, and one subjective.

    That’s precisely my view.

  31. #31 Thanny
    October 12, 2011

    Since the topic is probability, something that’s always bothered me is the “gambler’s fallacy”. It seems to me that the fallacy is itself a fallacy.

    The probability that a coin will end up heads 50 times in a row is much smaller than the probability that it will end up heads just once. That’s all the “gambler’s fallacy” ever claims – if you are a gambler, and you’ve witnessed the equivalent of 50 heads turning up, is it really a fallacy to expect a higher probability of tails on the next flip? Isn’t it the same as saying that in all the times the gambler has seen 50 heads turn up, more often than not tails turned up next? Isn’t that actually true from any probability perspective?

    What’s wrong with my interpretation?

  32. #32 Marshall
    October 12, 2011

    @thanny:
    Assuming independent trials and a fair flip, it isn’t true that when 50 head-flips are observed, usually a tail-flip will come next. There are only 2 relevant cases: 50 heads followed by a tails, and 51 heads. Expectation of tails is 1 out of 2 = .5. The rarity of a run of 50 flips is irrelevant. Try a monte carlo if you like: a run of 50 heads to qualify is going to be tedious, but try a run of 2 or 3.

  33. #33 Collin
    October 13, 2011

    @21. Interesting that you should ask me to read it again. Andy, Bill, and Carl are not real people. They are just characters in your narrative. By suggesting that the narrative should be re-read, you’re confirming my assumption that it is a repeated experiment. So if Andy, Bill, and Carl are self-aware, they know this and can plan for the next time someone reads it. :)

    But seriously… Although a person reading a story obviously can’t create characters as I suggested in that doggerel, it’s not that far-fetched that a particle detector might be creating some of the states it’s assumed to be observing. Arguments like these, combined with a willingness to accept tachyonic ghosts, might be the start of showing that probability isn’t really that different at the quantum level.

  34. #34 Tom
    October 13, 2011

    @6

    DC – I don’t think you’ll find any philosophers who don’t think numbers exist. The argument is whether they are a like words, which we invented, or they are a special kind of thing, independent of us, that we discovered.

  35. #35 Wow
    October 13, 2011

    “DC – I don’t think you’ll find any philosophers who don’t think numbers exist.”

    I do believe you’re wrong.

    3 apples exist. And that is different from the existence of 2 apples. But 3 and 2 don’t exist as real things, just as counts.

    “independent of us, that we discovered”

    Chimpanzees can count to three and IIRC humans have to count five or more, we don’t know the difference between five things and six things without counting “one, two, … Oh, that’d be five”.

    So numbers are things nonhumans have too.

    They just don’t exist as things in themselves.

  36. #36 phayes
    October 13, 2011

    “If your credence is simply based on what you know about long-run frequencies, then why talk about credences at all? Why not just talk about frequencies and be done with it?”

    Because your state of knowledge never includes ‘knowledge of long-run frequencies’. How could it?

    “And since it’s an objective fact that certain physical systems produce stable long-run frequencies”

    It isn’t. This is from chapter 10 of Jaynes:

    “This point needs to be stressed: those who assert the existence of physical probabilities do so in the belief that this establishes for their position an `objectivity’ that those who speak only of a `state of knowledge’ lack. Yet to assert as fact something which cannot be either proved or disproved by observation of facts, is the opposite of objectivity; it is to assert something that one could not possibly know to be true.”

    That Stanford Encyclopedia of Philosophy article’s omission of the ‘Jaynesian interpretation’ is unfortunate.

  37. #37 Jeff
    October 13, 2011

    Via Jordan Ellenburg’s blog (http://quomodocumque.wordpress.com/), I read this paper by Adam Elga entitled “Subjective Probabilities Should be Sharp”, which can be found at http://www.princeton.edu/~adame/papers/sharp/elga-subjective-probabilities-should-be-sharp.pdf. It’s a very short and easy read, and I found his arguments compelling, even though I am inclined to agree with your hesitancy to assign probabilities to one-off events, Jason.

  38. #38 Lenoxus
    October 13, 2011

    When I was twelve or so, I asked my mother what it really meant when the newspaper said there was a 40% chance of rain. I could be mistaken, but I think she answered that out of all the days the meteorologists had seen with conditions similar to this one, about 40% had rain. It might not have been my mother, it might even have been me thinking to myself, and whoever it was might have said something more like “If today were repeated a thousand times, about 400 of those hypothetical days would have rain”. (Alternative-universes frequentism?)

    Either way, I more or less became a frequentist from then until my encounters with the Internet and various discussions of probability therein, particularly the ones on Less Wrong. These days I lean slightly towards the Bayesian “degree of knowledge” model, but I see all models as deficient to some degree. In the case of Bayesianism, one could arrive at some peculiar conclusions as a result of the apparent subjectiveness.

    Most people, even very smart peple, who encounter even the fully-described version of Monty Hall (host-always-shows-goat-etc) will respond that the probability of their first door being right is 1/2. Could you not argue that their “degree of knowledge” or “credence” really is 1/2, because in a sense it takes some extra knowledge — by explaining the problem in terms of “Monty is giving you two doors” or “Imagine if there were a million doors” — before a person’s answer (or “credence”) changes to 1/3? In other words, the Bayesian model needs some odd fleshing-out along the lines of “the degree of knowledge of a maximally intelligent being who nonetheless doesn’t know where exactly the car is”. In any case, frequentism is the one model that necessarily convinces anyone, with the exception of those who would argue your simulations are inaccurate (but they would have to give good reasons why).

    Suddenly, I’m curious whether 2/3 of a car is more economically valuable than a whole goat…

    Thanny @ 31:

    Isn’t it the same as saying that in all the times the gambler has seen 50 heads turn up, more often than not tails turned up next? Isn’t that actually true from any probability perspective?

    It’s false in a frequentist sense (and every other sense as well). In fact, out of all the times in history that 51 coins have been flipped, an equal number have the sequence “50 heads followed by 1 tails” as have “51 heads”.

    One thing to help grasp this is to recognize that yes, 50 heads in a row is rare, but it’s no rarer than any other sequence of 50 coin flips — just easier to describe. A monkey who types 6 random letters has a very low chance of typing a word and a very high chance of typing gibberish. But that doesn’t mean “JPBFUU” is any more likely than “MONKEY”, even though the first is gibberish and the second isn’t; they are both equally unlikely. (If the monkey is truly random, unlike real-life monkeys which tend to prefer hitting the same key repeatedly, like these guys.)

  39. #39 Blaine
    October 13, 2011

    MOST modern philosophers think that numbers DO NOT exist just as words do not exist. Numbers and words exist in the trivial sense that they are products of the human mind. Before humans evolved sentience, where were numbers and words? Math itself is just a language. Unfortunately, when mathematicians/scientists like Roger Penrose say philosophical things, they sound silly as when he says that mathematicians are discovering and exploring a really existing transcendent platonic world and not inventing mathematics. I find this rather shocking…on the order of saying he believed that a dead jewish terrorist died for his sins and was resurrected..or that the two attributes of god that manifest themselves in the world are Thought and Extension ( Spinoza ).

    You can’t rightly call yourself an atheist if you think numbers and words exist…you’re just another theist, only your god is different.

  40. #40 G.D.
    October 13, 2011

    JR:
    “simply don’t understand what it means “to assign a 1/3 credence” to something. That sounds like gibberish to me. If your credence is simply based on what you know about long-run frequencies, then why talk about credences at all? Why not just talk about frequencies and be done with it?”

    I agree (with the last point). My point was what the debate of the original post is about. “Objective probabilities” generally don’t make any sense (in the sense of “objective” that is the topic of the original post). The Monty Hall case illustrates that, since the objective chance that you chose the right door is *never* 1/3 or 2/3. It is always 1 or 0. Either the door hides the prize or it doesn’t, and this is already determined even before you went on the show. The only sense one can make of “probability of 1/3″ in this case is “given what you know, your state of information, you should place a confidence level of 1/3 on the belief that the prize is behind this door”. The point is: probabilities are not “things out there in the world” (since the world may, for all we know, be fully deterministic, and if even if it isn’t at the quantum level at least it is in the case of Monty Hall); rather, probabilities reflect how much confidence a rational agent should have in a given outcome. And that’s what probabilities *are*.

    In the more ordinary sense of “objective” such probability assignments are no less objective – it is not up to you what confidence you rationally should have in a hypothesis. Frequencies, for instance, provide good evidence for what the rational level should be.

    DC#6: Your comment is based on a common confusion. No one doubts that words exists, and no one doubts that *numerals* exists. Numbers are not the same as numerals. After all, “2” and the Roman numeral “II” denote the same number, but wouldn’t if numbers just were numerals.

    There’s a long philosophical history here, and despite what some mathematicians may think, the philosophical discussion has had profound influence on the discipline. After all Frege invented the quantifiers to answer this question, and Russell’s paradox and his type theory were developed to provide an acceptable epistemic and metaphysical foundation for mathematics. And no one can deny the profound effects of those efforts on (at least branches of) mathematics. Indeed, the claim that mathematicians don’t have use for philosophy would have sounded more compelling if there had been a sharp distinction between mathematics and philosophy of mathematics, but in areas such as mathematical logic there isn’t, and many of the crucial discoveries have been made at least in part as responses to “philosophical” concerns (the Löwenheim-Skolem theorem, Gödel’s completeness and incompleteness theorems, etc.)

  41. #41 Blaine
    October 13, 2011

    In a trillion, trillion, trillion years the expansion of the universe will have reached such a point that the universe will blink out in a catastrophic hail of subatomic particles and atoms themselves will no longer exist. There will be nothing left to embody anything. Of course, our solar system and us with it, will be long gone by then. Just a burnt cinder floating aimlessly in space. This is a scientific fact. In logical time, it has already happened.

    If numbers and words exist, where will they be then? In god’s mind…come on folks, get with the program.

  42. #42 Knightly
    October 13, 2011

    So what are the chances they got this right?

    Vegas is giving 300:1.

  43. #43 Blaine
    October 13, 2011

    They may be off by a couple hundred trillion years…

  44. #44 Thony Christie
    October 13, 2011

    For those that disparage or make snide comments about the philosophy of maths I would point out that you’re displaying an incredible ignorance of the history of maths. A large number of major advances in maths have come about because mathematicians addressed and tried to solve philosophical questions. I’m not going to use a comments column to list all of the cases but every mathematician should be aware that for example non-Euclidian geometry came about because people asked philosophically questions about Euclid’s fifth postulate.

  45. #45 Lenoxus
    October 13, 2011

    When Jason wrote that “it is not uncommon for philosophers to lament the philosophical naivete of scientists (for example, in this recent book review)”, I was worried the linked review would involve some sort of Courtier’s Reply to a science book whose author callously unweaved a rainbow or two. In actuality, the article is a decent overview of contemporary neurology, and its criticisms of the book in question were mostly good from a scientific (not just philosophical) perspective.

    For instance, writer V.S. Ramachandran overreaches when he attempts to explain specific features of art in terms of our knowledge of neurology. I have no doubts that every facet of humanity is ultimately explicable in terms of neurology, etc, but right now it’s too early in the game to draw more than trivial conclusions about which “rule” in art creation/appreciation derives from which property of the mind (we don’t even know what most of the rules are!).

    In another paraphrashed part of the book, Ramachandran suggests that the “come-hither gesture” may derive from the shape the tongue makes in producing the word “hither”. I’m not unfamiliar with phonetics, and it seems to me that we would have a few more “phonetic gestures” if that were the case. Furthermore, a much simpler explanation is that the gesture indicates the direction you want the person to travel. Partially confirming this is that frequent variants involve the hand moving pointed sideways instead of up, something the tongue doesn’t really ever do in spoken language. (It doesn’t go up much in the word “hither” either.) Also, many other gestures have a straightforward “visual symbolism” origin, such as the police officer’s “stop” gesture strongly resembling the actual physical motion necessary to stop someone coming at you (albeit often simplified, if the cop is holding something, to one arm instead of two).

    Apart from such speculations, the book actually looks pretty good, although it treads territory many SB readers will be pretty familiar with, such as blindsight, split-brain patients, etc. I can’t say with confidence that I learned anything about the human brain from the review, but that’s just me.

  46. #46 TylerD
    October 14, 2011

    The specific idea of “randomness” becomes less problematic when you look at it from a Kolmogorov-Chaitin perspective: something (data) is random if it permits no description in a formal model that is shorter than the literal representation. Statistics then amounts to finding the non-random or compressible part of data and separating it from the incompressible part, or the noise.

  47. #47 Dan L.
    October 26, 2011

    I think Bayesians just get excited because Bayes’ Theorem is so freakin’ cool (I mean, it includes Popperian falsificationism as a special case! c’mon!). You’re absolutely right that Bayes’ theorem just becomes a frequentist view when applied to a series of observations (assuming the experimenter doesn’t start with any special information).

    And the claim about Bayes’ theorem handling “one-off” events is just weird. Sure, it CAN, but it’s not very good at it because of the limited information. Bayes’ theorem isn’t very useful until you’ve actually done the experiment a few times; the initial probability can’t be anything other than a guess.

  48. #48 Dan L.
    October 26, 2011

    Thony Christie@44:

    Great point. I would simply add that almost all of Cantor’s entire career and contribution to mathematics was inspired by philosophical problems about infinity.

  49. #49 Daryl McCullough
    October 29, 2011

    I think this is a fascinating topic, and my feeling is that, somehow, Bayesians are more right about probability than frequentists, but that they’re not right, either. I don’t have a definitive argument against Bayesian probability, but it appears to me that quantum mechanics gives a notion of probability that is NOT in any way subjective. Subjective probability gives the impression that if you only knew more about the details of the situation, you would come up with a different probability, but apparently that is not the case with quantum mechanics. A free neutron has a 50% chance of decaying into a proton, electron and anti-neutrino in any given 14 minute long period. There are no additional details about the state of the neutron that would allow me to adjust that probability. There is something NON-subjective about quantum probabilities, it seems to me.

    But I don’t think that the frequentist approach makes a whole lot of sense, either. For one thing, it doesn’t actually give an answer to the question: What is the probability of a coin toss resulting in “heads”? If you define it as a limiting frequency–the limit as N goes to infinity of the number of heads in N tosses divided by N–you have to face the issue of: why that should that number have a limit, in the first place? Not every infinite sequence of numbers has a frequency. Why should coin flips?

    You could say that, empirically, when you flip a coin 100 or 1000 times, it seems to settle down to a limiting frequency of heads, but why should you expect 1000 flips to give a good estimate about what would happen in infinitely many flips?

  50. #50 Sean
    November 2, 2011

    Fantastic post I very much enjoyed it, keep up the good work.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.