...you've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?
- Dirty Harry
The laws of probability, like most of the mathematical rules that govern the world, are a relatively recent discovery. Ancient people like the Romans loved to gamble as much as we do, and they had at least some idea of how certain kinds of odds worked, but they'd probably have been flummoxed by many of the mathematical tools we use today to study chance. But then again, how many people at your average casino understand how to calculate the probabilities that govern the flow of their money? Well, probably more than in the general population. But I bet it's still not that many.
Because there's so much money involved, problems involving gambling are very popular in introductory probability classes. Take, for instance, the Texas Lottery. The odds of winning are supposedly 1 in 25,827,165. Millions of people buy tickets for each drawing, so what is the probability of having n winners given that N tickets are purchased?
The exact answer is given by the binomial distribution. However, the binomial distribution involves taking factorials of its parameters, and when the parameters are numbers like 25,827,165 that becomes pretty much impossible. But there's a very good approximation we can use that makes our life a lot easier in those cases where you have a low-probability event repeated many times. It's the Poisson distribution, named after the early 19th century French mathematician SimÃ©on-Denis Poisson. It's our Sunday Function:
Here n is the number of successes and lambda is the expected number of successes. The expected number of successes is just the number of trials N times the probability of any given trial being successful. Here "successes" is a term of art meaning just "the low-probability even happens". Its connotation works well when we're talking about winning the lottery, but not necessary in other cases - you could just as well model something like train derailments as a Poisson process, after all.
The population of Texas is about 24,782,302 according to Wikipedia. So let's pretend every one of them buys a single ticket and they don't collude in picking their numbers. Multiplying that by the probability of winning gives an expected number of wins: λ = 0.959544. Use this as our parameter and plot:
This plots the probability of having n winners, for n on the x axis.* As expected, it falls off rapidly. If more people played - say 40 million - the bump would be shifted over to the right as it became more likely there'd be more winners. As it is, under these conditions the probabilities of n winners are:
0 - 38.3%
1 - 36.8%
2 - 17.6%
3 - 5.6%
4 - 1.4%
5 - 0.3%
And so on, rapidly heading toward zero. In practice I'm sure a lot less than the entire population is buying tickets, so most days the prize ought to roll over or otherwise go unclaimed. As of now the jackpot is about $35 million, but I think I'll be saving my money.
*As a commenter points out, while the function is defined for all real n, the Poisson distribution itself is only valid for positive integer n. Plugging in n = 3.14159... wouldn't tell you anything meaningful since it's not possibly to have anything other than a whole number of winners. The reason I've plotted the function continuously instead of just at integral values is so that the overall behavior of the function itself - particularly the location of the maximum - is most clear. From there it's no difficult thing to understand that in reality it's usually just the integers we're interested in.
Matt, your plot has a problem: the Poisson density is defined only for integer n's, while your plot implies otherwise.
The function is defined for all real n, even if the distribution isn't. The main point is to get a feel for how the function behaves, which is more difficult if we only look at positive integer n. That said, your point about the Poisson distribution itself is correct and I'll clarify in the entry.
It's interesting that this is a continuous function and yet the sum of a set of particular values of it (the integers >= 0) will sum to 1. And for any positive real value of lamba.
Personally, I'm going to find a lottery where that maximum falls exactly at 1 success. Because when I win the lottery, I don't want to have to share it with anyone else.
Clark at #4 raises an interesting point: how do you minimize the chances of sharing the winnings?
I don't think it occurs when the maximum is at 1. I think it occurs in the limit that lambda ->0, which gives a maximum at 0.
This is kinda what you do in quantum optics if you have a coherent source of light, and you want a single photon in your pulse but no multiple-photon pulses. You attenuate the crap out of it (lambda->0). Unfortunately, this means most of the time you get zero photons.
There's an elegant way to program your computer to use extreme cases of the binomial (or a hypergeometric, or Poisson) without causing an overflow error. You simply keep track of the logarithm of the probability while looping. This involves adding instead of multiplying, subtracting instead of dividing, and multiplying instead of raising to a power. You can take the inverse-logarithm of the result when you are finished looping.
"But then again, how many people at your average casino understand how to calculate the probabilities that govern the flow of their money? Well, probably more than in the general population. But I bet it's still not that many."
Are people that understand statistics more likely to gamble or less? You seem to think more, but I would say less.
Once you know that all games are statistically weighted so that you'll lose money, why play? Unless you have developed some "system", I think knowing the odds makes it less entertaining or appealing. That combined with the knowledge that gambling disproportionally affects the poor, and the poor are more likely undereducated, it makes me think less people than average at a casino understand probability. There are probably a lot of people who have memorized odds for games such as roulette, craps and blackjack (or maybe not even odds but just the proper actions to take in specific situations), but I doubt all of them understand how to determine what the probability of a result is.
As an interesting complication, it's known that lottery number selections are far from uniform---numbers encoding birthdates, "lucky numbers", etc are selected far above chance rates.
So in actual practice, the distribution of number of winners will be very similar to the Poisson equation, but skewed a bit to the right. On occasion, as I recall, there have been cases where a truly exceptional number of people win simultaneously, because the number drawn had some sort of significance.