DiMaggio's Streak

It's been a hotly debated scientific question for decades: was Joe DiMaggio's 56-game hitting streak a genuine statistical outlier, or is it an expected statistical aberration, given the long history of major league baseball? I'd optimistically assumed, based on the work of Harvard physicist Ed Purcell (as cited by Stephen Jay Gould) that DiMaggio was the real deal. Here's Gould:

Purcell calculated that to make it likely (probability greater than 50 percent) that a run of even fifty games will occur once in the history of baseball up to now (and fifty-six is a lot more than fifty in this kind of league), baseball's rosters would have to include either four lifetime .400 batters or fifty-two lifetime .350 batters over careers of one thousand games. In actuality, only three men have lifetime batting averages in excess of .350, and no one is anywhere near .400 (Ty Cobb at .367, Rogers Hornsby at .358, and Shoeless Joe Jackson at .356). DiMaggio's streak is the most extraordinary thing that ever happened in American sports.

But science, with its relentless pursuit of fact and abhorrence of anomalies, has apparently concluded that DiMaggio wasn't so special after all. In their latest excellent podcast, Radiolab interviews Steve Strogatz, a mathematician at Cornell, who worked with his student Sam Arbesman to simulate the history of MLB only to demonstrate that there was nothing statistically freakish about DiMaggio's hitting streak. Others, however, aren't quite so sure. The controversy continues.

More like this

Someone should really tell the NCAA tournament television commentators that "the hot hand" doesn't exist. I've gotten pretty tired of hearing these tired cliches about Texas going cold, or Stephen Curry catching fire yet again. Never has a cognitive illusion gotten so much play. The illusory…
In The Flamingo's Smile, a compilation of Gould's articles for Natural History, there is a lengthy discussion of the principle of decreasing variation within established patterns and the disappearance of .400 hitters in baseball. According to Gould the loss of .400 batting average reflects…
One of the chapters of the book-in-progress, as mentioned previously, takes the widespread use of statistics in sports as a starting point, noting that a lot of the techniques stat geeks use in sports are similar to those scientists use to share and evaluate data. The claim is that anyone who can…
After Thursday's post about sports and statistics, a friend from my Williams days, Dave Ryan, raised an objection on Facebook: There's an unstated assumption (I think) in your analysis: that there is some intrinsic and UNALTERABLE statistical probability of getting a hit inherent in every hitter.…

I don't understand what you mean - the streak is a remarkable achievement. It's strange that it is 12 games longer than the 2nd longest streak. Can't it also be something that one would reasonably expect to happen to someone in the long history of baseball?

As a comparison, you might take any number of scientific discoveries and say that it was remarkable that scientist X discovered it (say evolution). If Darwin and Wallace had never lived - someone else would have eventually come up with the idea of natural selection because that's where the evidence pointed. They are still remarkable for recognizing it first.

I think there's a difference between saying that DiMaggio's streak was not remarkable and saying that it is not earth-shattering that someone managed to hit in 56 straight games at some point.

Also - in the passage you quote, it appears that Purcell calculated based on careers of 1000 games. This is a pretty large flaw as I see it, given that even with the old season length of 154 games, that's 6.5 seasons. Lots of players play more or less every day for 15 years.

Did you know that during the two months of DiMaggio's streak, Ted Williams actually hit for a higher average than DiMaggio? What what amazing about DiMaggio's streak wasn't his great hitting, but the distribution of it.