If you like sports (specifically hockey) and you like statistics, two posts from Tom Benjamin's NHL Blog are must reads (available here and here). With help from Dave Savit, a math professor at the University of Arizona, Tom describes how hockey can be modeled using a Poisson distribution. There are also Poisson Standings for the NHL season. Some have called this Moneyball for hockey. More stuff below the fold.
The idea of the model is that goals can be considered Poisson random variables. You can calculate the expected number of goals scored by a team in a single game using the number of goals scored in the entire season. The same can be done for individual players. We can also calculate the expected number of goals in any length of time (single periods, over two periods, over a five minute stretch). The expected number of goals scored by a team in a single game can then be used to predict the winners of individual contests.
Saying that the outcome of a game is random is a bit misleading. Each team has its own probability distribution for goals scored per game. The model predicts who will win a match up between two teams. Sometimes the 'worse' team will win because of the variance around the expected number of goals scored. You can think of each game as a coin flip with a weighted coin -- the weighting of the coin depends on which team scores more goals per game. This is not very remarkable (the team that tends to score more goals per game tends to win more games), but it indicates that a team's performance is consistent from game to game. As Tom points out:
"This assumption flies in the face of many hockey myths around things like clutch play and the apparent ability some players have to rise to the occasion. The fact is that somebody has to come through in the clutch and that somebody is randomly selected by the hockey gods. This idea makes us feel uncomfortable because it is disturbing to realize that so many things in life are beyond our control. It means hockey - indeed life - in the short run is about luck and probabilities. Skill only outs in the long run and even a season is a relatively short time period."
Other sports could probably be treated in a similar manner. I could definitely see soccer and possible baseball fitting the Poisson model. Football (American style) and basketball would be trickier because the offensive output of a team depends so much on the defensive capabilities of their opponent (this may be the case in baseball as well).
What we must remember is that a single poor (or excellent) performance could be due to deviations from the mean, and it all balances out over a complete season. In sports like hockey, baseball and basketball where teams play many games in a single season (82 in the NHL and NBA, 162 in Major League Baseball), the best teams will finish with the best records. This is because the more trials you run (each game is a trial), the lower your variance. In the NFL (16 game seasons) there is a greater chance that the best teams will not have the best records (assuming points are determined by a Poisson process). In single game playoffs (the NCAA basketball tournament for example) the better team has even less chance of winning. The benefit of playing best of seven series in the major American sports is that we increase the probability that the better team will win.
Keep this in mind as you watch the Winter Olympics the next couple of weeks or the World Cup this summer. It makes me wonder whether the Miracle was a great athletic achievement or an odd draw from a Poisson distribution.
"Other sports could probably be treated in a similar manner. I could definitely see soccer and possible baseball fitting the Poisson model. Football (American style) and basketball would be trickier."
Exactly. The poisson is good at modeling rare events, which is how one could consider a goal in hockey or soccer (the probability of scoring each time down the field or ice is pretty low).
For sports where scoring is more common (basketball, football), you could use a binomial, but the fact that there are different ways to score (field goal vs. touchdown) would make it a bit more difficult. But pretty fun to think about in any case.
Has anybody extended this model so that it doesn't just model goals made, but also goals allowed (and, by implication, net goals). If goals made are Poisson distributed, so should goals allowed, right?
It's already established that the Pythagorean Theorem of baseball is pretty accurate over a season. For any team, expected winning percentage = runs scored^2/(runs scored^2 runs allowed^2). Over a season, it does a pretty good job of coming within 4 games of the actual record.
See this Baseball Prospectus article at http://tinyurl.com/dletq (unfortunately missing the graphs) for much more info about the Pythagorean Theorem of baseball, finding the correct exponents, and the applicability of the Poisson distribution to baseball.
Ummm.. why do people keep making this error - Yes a Poisson random variable will follow a Poisson distribution, but that does not mean that something following a Poisson distribution discretely is a random variable. I used to see this all the time in statistical quality control where people would throw up their hands at a problem, then when some underlying cause was found, bingo the distribution changed radically. The quote above is actually funny:
"This assumption flies in the face of many hockey myths around things like clutch play and the apparent ability some players have to rise to the occasion. The fact is that somebody has to come through in the clutch and that somebody is randomly selected by the hockey gods. "
Maybe they are better players? Some do score better than others - even in clutch situations. If a team got better their goals per game would go up, is that just the Hockey gods also? The fact that individual players generally score goals roughly the same over different time periods only shows that their talent and coaching is roughly the same over different time periods - Look at coaching changes where the winning percentage went up, what do you know the scoring went up also... they would still follow a Poisson, just a different one.
The fact that you want your best scorer to have the puck in a clutch situation doesn't mean they are better in the clutch, than otherwise, it means they are better in the clutch than anyone else you've got. So a coach would be right to say they are better in the clutch...
All this is really saying is that in any given game you can't predict who will do well except with probabilities, and that evidently talent and coaching do matter because different teams score and defend differently. The randomness does not seem to be "real" because personel and coaching changes do make a difference, just the effect of complexity.
Sure, the wording is sloppy, but the point was that goal scoring does not differ from game to game, period to period, or shift to shift. The probability that a player scores a goal in the the last few minutes of game is, essentially, the same as any other time during the game -- they're working to disprove the notion that there is such a thing as "clutch" performance.