Writing in 1866, John Venn (of Venn diagram fame) wrote:
To many persons the mention of Probability suggests little else than the notion of a set of rules, very ingenious and profound rules no doubt, with which mathematicians amuse themselves by setting and solving puzzles.
A classic example is the Tuesday Birthday Problem (TBP), which a reader has asked me to comment on. Let’s dive right in by stating the problem:
You meet a man on the street and he says, “I have two children and one is a son born on a Tuesday.” What is the probability that the other child is also a son?
Seems innocent enough. But if you’re first inclination is to say that the information about the Tuesday birthday is irrelevant and the answer is just 1/2, then read on.
Let’s try a simpler problem. Suppose we know that a certain man has two children and we also know that the older one is a boy. In this case we would say that the probability that the other child is a boy is 1/2. After all, the sex of one child is independent of the sex of the other child. That the older child is a boy has no bearing on the sex of the younger child.
Now suppose we know simply that a man has two children and that one of them is a son. This time we would reason that there is no possibility that the person has two girls. It follows that the sexes of his two children, ordered from oldest to youngest, are either BB, BG or GB. Since these cases are equally likely, and since only one of them involves having two boys, we would say the probability that the man has two boys is 1/3.
This is all very cogent, but there are complications we haven’t yet considered. We shall return to this in a moment.
Now let’s consider the TBP. Everyone’s first instinct is to think the information about the Tuesday birthday is simply irrelevant. I mean, seriously, you knew each of the kids was born on some day of the week, so what possible difference could it make that you now know one was born on a Tuesday?
Well, here’s the difference. Let us first assume that it is the older child who was a son born on a Tuesday. In this case the second child could be either of two sexes, and could have been born on any of seven days of the week, for a total of 14 possibilities.
Now let’s suppose it is the younger child who was a son born on a Tuesday. Then the older child could, again, be either of two sexes and could have been born on any of seven days of the week, again providing 14 possibilities. Added to our original 14 that would seem to give 28 possibilities.
But be careful! One possibility got counted twice. Specifically, the one where both children are boys born on Tuesdays. So really there are only 27 possibilities. And since 13 of them involve the second child being a boy, the probability would be 13/27.
Were this the end of the story it would still be a fascinating problem. It is genuinely surprising that the information about the Tuesday birthday could be relevant, but our simple count shows that it is.
Now, we should mention at this point that we are discussing mathematical children who have no existence outside the world of probabilistic brainteasers. So we’re not worrying about the fact that a disproportionate number of children are born on Mondays and Tuesdays, since C-sections aren’t usually scheduled for the weekends. We’re also not worrying about twins or triplets. Or any other sort of “real world” consideration that might occur to you.
But there is a complication we should ponder. When you are thinking about problems in conditional probability, it is not only the new piece of information that is relevant. You also must consider how the information was obtained.
The classic example of this is the Monty Hall problem. (I shall assume you are familiar with this problem. If you are not, I know a good book you should read.) The common fallacy is to ignore what we know about how Monty makes decisions. Thus, when he opens an empty door we tend to think, mistakenly, that we have only learned that that door is empty. In reality we have learned that Monty, who makes his decisions in a rigidly controlled way, chose to take a certain action.
That has relevance to all of these problems about the two children. How we assess the probability that the second child is a boy will depend in part on how we learned that one of the children was a boy in the first place.
Peter Winkler explains the significance of this point:
This puzzle confounds people *legitimately*, however, because most of the ways in which you are likely to find out that X has at least one boy contain an implicit bias which changes the answer. For example, if you happen to meet one of X’s children and it’s a boy, the answer changes to 1/2.
Suppose the puzzle is phrased this way: X says “I have two children and at least one is a boy.” What is the probability that the other is a boy?
Put this way, the puzzle is highly ambiguous. Computer scientists, cryptologists and others who must deal carefully with message-passing know that what counts is not what a person says (even if she is known never to lie) but *under what circumstances would she have said it.*
Here, there is no context and thus no way to know what prompted X to make this statement. Could he instead have said “At least one is a girl”? Could he have said “Both are boys”? Could he have said nothing? If you, the one faced with solving the puzzle, are desperate to disambiguate it, you’d probably have to assume that what really happened was: X (for some reason unconnected with X’s identity) was asked whether it was the case that he had at least one son, and, after being warned–by a judge?–that he had to give a yes-or-no answer, said “yes.” An unlikely scenario, to say the least, but necessary if you want to claim that the solution to the puzzle is 1/3.
This article from Science News helps explain the relevant distinction in these sorts of problems:
Suppose that you already knew that Mr. Smith had two children, and then you meet him on the street with a boy he introduces as his son. In that case, the probability the other child is a son would be 1/2, just as intuition suggests. On the other hand, suppose that you are looking for a male beagle puppy. You want a puppy that has been raised with a sibling for good socialization but you are afraid it will be hard to select just a single puppy from a large litter. So you find a breeder who has exactly two pups and call to confirm that at least one is male. Then the probability that the other is male is 1/3.
In the scenario of Mr. Smith, you’re randomly selecting a child from his two children and then noticing his sex. In the puppy scenario, you’re randomly selecting a two-puppy family with at least one male. (Emphasis added.)
This all takes some getting used to. If you’re like me, then it just seems obvious to you that knowing the sex of the older child tells you nothing about the sex of the younger. But the scenario in which you meet one of the children on the street just sounds different. It sounds more like our initial scenario, in which we know simply that a father has two children and one is a boy.
We should be thinking like this: Are you being asked to infer the sex of one person based solely on information about someone else? Or are you being asked to comment on the distribution of sexes among families with multiple children, given information about one of the children?
As the Science News article goes on to note, this has a curious consequence:
The remarkable thing that Foshee’s variation points out is that any piece of information that affects the selection will also affect the probability. If, for example, you selected a family at random among those with two kids, one of whom is a boy who plays the ukulele and wants to become a dancer, the ukulele-playing and dancing ambitions would affect the probabilities about the sex of his sibling.
That still seems weird to me, even though it is precisely the main principle in the Monty Hall problem. Anyway, Tanya Khovanonva provdes some variations on the TBP to illustrate what is going on:
Now let us consider the first scenario. A father of two children is picked at random. He is instructed to choose a child by flipping a coin. Then he has to provide information about the chosen child in the following format: “I have a son/daughter born on Mon/Tues/Wed/Thurs/Fri/Sat/Sun.” If his statement is, “I have a son born on Tuesday,” what is the probability that the second child is also a son?
The probability that a father of two daughters will make such a statement is zero. The probability that a father of differently-gendered children will produce such a statement is 1/14. Indeed, with a probability of 1/2 the son is chosen over the daughter and with a probability of 1/7 Tuesday is the birthday.
The probability that a father of two sons will make this statement is 1/7. Among the fathers with two children, there are twice as many who have a son and a daughter than fathers who have two sons. Plugging these numbers into the formula for calculating the conditional probability will give us a probability of 1/2 for the second child to also be a son.
This fits well with our analysis. In this scenario a child is selected at random, and then information is provided about his sex and birth day. We are thus being asked to infer the sex of one child based solely on information about the sex (and birth day) of the other. Thus, we expect the answer to be 1/2. In other words, knowing the sex of one child tells us nothing about the sex of the other.
Now let us consider the second scenario. A father of two children is picked at random. If he has two daughters he is sent home and another one picked at random until a father is found who has at least one son. If he has one son, he is instructed to provide information on his son’s day of birth. If he has two sons, he has to choose one at random. His statement will be, “I have a son born on Mon/Tues/Wed/Thurs/Fri/Sat/Sun.” If his statement is, “I have a son born on Tuesday,” what is the probability that the second child is also a son?
The probability that a father of differently-gendered children will produce such a statement is 1/7. If he has two sons, the probability will likewise be 1/7. Among the fathers with two children, twice as many have a son and a daughter as have two sons. Plugging these numbers into the formula for calculating the conditional probability gives us a probability of 1/3 for the second child to also be a son.
In this scenario our focus has shifted to the distribution of sexes in a family with two children. Thus, the knowledge that there is at least one boy is relevant. But given the procedure we employed in deciding who to ask, the day of the week on which the boy is born is not relevant to the distribution of sexes. Thus, we get precisely the 1/3 answer we discussed earlier.
Now let us consider the third scenario. A father of two children is picked at random. If he doesn’t have a son who is born on Tuesday, he is sent home and another is picked at random until one who has a son that was born on Tuesday is found. He is instructed to tell you, “I have a son born on Tuesday.” What is the probability that the second child is also a son?
The probability that a father of two daughters will have a son born on Tuesday is zero. The probability that a father of differently-gendered children will have a son who is born on Tuesday is 1/7. The probability that a father of two sons will have a son born on Tuesday is 13/49. Among the fathers with two children, twice as many have a son and a daughter than two sons. Plugging these numbers into the formula for calculating the conditional probability will give us a probability of 13/27 for the second child to also be a son.
And here it was built into our selection procedure that we are only surveying families with two children, with the added assumption that one of them is a boy born on Tuesday. If you surveyed all such families, you would find that roughly 13/27 of them have two boys.
If you read the Science News article I linked to, you will find that this problem was raised at the Ninth Gathering for Gardner, a conference held biennially in honor of Martin Gardner, in Atlanta. As it happens, I was at that conference, as was Tanya. We discussed this problem at length. My initial reaction was that it should be irrelevant that one child was born on a Tuesday. But then, thinking that the problem would not have been raised if the obvious answer were correct, I managed to enumerate the sample space and came up with 13/27. I was pleased with myself, but Tanya gave me an earful about unwarranted assumptions. She was right, of course.
On the other hand, as I noted in the BMHB (that’s the big Monty Hall book), and as I suggested to Tanya at the conference, there really does come a time when the hair-splitting must stop, unless we are specifically trying to drive people insane with these problems!