After Thursday’s post about sports and statistics, a friend from my Williams days, Dave Ryan, raised an objection on Facebook:

There’s an unstated assumption (I think) in your analysis: that there is some intrinsic and UNALTERABLE statistical probability of getting a hit inherent in every hitter. If that is the case, then yes — a hitter whose mean is .274 that hits .330 in the first half should, reasonably, be expected to regress towards the “true” outcome of .274 in the second half; and if the hitter winds up batting .330, it’s a statistical outlier. But that ONLY applies if the underlying probability is unalterable. Which, in baseball, isn’t something you can just blithely assume. Hitters can and do get better (or worse) through coaching, experience, PEDs, or what have you. And when the mean itself is malleable, you can’t dismiss Kornheiser as just ignorant of statistical regression.

That’s a good point, and as someone who has played a lot of sports (though not much baseball– I suck at pretty much any game involving hitting a ball with a stick), I would certainly like to believe that there are real changes in the underlying distributions. I mean, it certainly *feels* like there are days when I’m shooting the basketball better than others, and thus there’s a higher probability of a given shot going in.

The problem, of course, is showing that scientifically, which turns out to be damnably difficult. This is related to the famous “hot hand” problem, and nobody has really managed to find any evidence that this kind of effect exists.

But the question at hand dealt with baseball, and changes in the underlying “real” batting average for a player. So let’s see if we can learn anything about this, armed with some really rudimentary statistical knowledge, and some historical statistics for players chosen not quite at random from the career batting average list.

I’m sure a bunch of different sabremetric geeks have done vastly more comprehensive studies of this, but for the purposes of this blog, we’re going to settle for plural anecdotes. I went down that list and picked a half-dozen players out to look at: guys who have had fairly long careers, are generally good hitters, if not at the very top of the career average list, and are names that I recognize despite not really being a baseball fan. The six I picked were: Barry Bonds and Alex Rodriguez (because Dave mentioned drugs as a driver of changes in hitting percentages), Chipper Jones and Derek Jeter (because to the limited extent that I follow baseball, I root for the Yankees, and I have a cousin in Atlanta who’s a big Braves fan), Cal Ripken, Jr (because I was living in Maryland during the year when he single-handedly saved baseball), and Ichiro Suzuki (because he’s famously a somewhat different sort of hitter, getting lots of singles off his great speed, but not hitting much for power). That seems like enough guys to make a point of some sort, but not so many that wrestling them all through Excel makes me want to claw my eyes out.

So, the question is whether we can see changes in the underlying probability of random processes from these data, in a way that is clearly distinct from random statistical fluctuations about a single unchanging average. So here’s a graph of the batting average for each season (the blue points) compared to each player’s career average (click for a version that’s large enough to read clearly):

Those are the data we have to work with, so how would we go about determining whether there’s a real change in the underlying probability of one of these guys getting a hit from season to season? The problem, of course, is that even if there’s an absolutely fixed probability that does not change over the course of a player’s career, there will be year-to-year fluctuations just from what you would call “shot noise” in electrical engineering or quantum optics: because a season consists of several hundred attempts to hit the ball with some probability, you will inevitably get a one-season average that bounces around a bit– if each at-bat is random, you’ll get occasional “hot” and “cold” streaks naturaly.

The bouncing around can be characterized by a standard deviation, and as I mentioned in the last post, this is roughly equal to the square root of the number of hits divided by the number of at-bats. For these guys, those values come out to be around 0.02 for any given season. The horizontal grid lines on the graph are spaced by 0.02 units, so roughly one standard deviation each.

Now, for each player, you can see a bunch of points that are more than one grid line away from the horizontal line indicating the career average, which you might be tempted to cite as evidence that there are real changes. Except that’s not how it works– really, you expect 68% of normally distributed measurements to fall within one standard deviation of the mean, so roughly one-third of the measurements ought to be outside that range just due to natural fluctuations (assuming the underlying random process fits a normal-ish distribution, which is generally a good place to start).

(This is forever fixed in my mind, by the way, because I got nailed by Bill Phillips on this point when giving a presentation on some of the data for my thesis– I put up a graph of collision rate measurements with error bars, and he objected that the theory line went through too many of the error bars. He was right– I had calculated the errors in a way that was a little too conservative– and he was also one of a very small set of people who ever would’ve raised that objection…)

So, the question isn’t whether there are *any* points outside the nominal uncertainty range, but whether there are *too many* points outside that range. You can count points on those graphs if you want, but I calculated it a little more accurately in Excel, and the fraction of single-season batting averages outside the one-standard-deviation range of the career batting average are:

- Bonds: 0.476
- Rodriguez: 0.235
- Jones: 0.412
- Jeter: 0.294
- Ripken: 0.444
- Suzuki: 0.538

If this were Mythbusters, they’d probably say that the fact that none of these are the 0.32 you would expect from a perfect normal distribution indicates that they’re way off. But, really, with the exception of Bonds and Suzuki, those are not that far off. And if you excluded Bonds’s first four years, he’d be at 0.412, in line with the others.

If you want to talk about real changes in underlying distributions from these, there are four things you might point to: Bonds’s first four years (where he seems to trend sharply upward before reaching a plateau), Bonds’s peak steroid era (the cluster of high points toward the end of his run), and the last three years for both Rodriguez and Suzuki, who have been well below their prior averages for those years. The only one of those that’s really dramatic, in a statistical sense, is Suzuki’s, as he’s been at least two standard deviations off for three years running. Bonds’s steroid years are close to that, but not out of line with previous fluctuations, and Rodriguez’s recent slump is a little more marginal.

Meanwhile, Jeter, Jones, and Ripken are indistinguishable from robots.

Another way you might try to look at it is to ask what the actual standard deviation in the average of averages is. I’ll give two numbers for each of these: the first is the standard deviation of the actual set of annual batting averages, the second is an average of the standard deviations you would expect based on shot noise in the number of hits for each of those seasons.

- Bonds: 0.036 – 0.026
- Rodriguez: 0.022 – 0.023
- Jones: 0.030 – 0.025
- Jeter: 0.021 – 0.023
- Ripken: 0.025 – 0.021
- Suzuki: 0.033 – 0.022

Again, those are pretty close to what you would expect from pure randomness. If you excluded Suzuki’s last three years, he’d drop to 0.023, and excluding Bond’s first four years would bring him down to 0.030, while knocking out the steroid years would bring him to 0.028. If there’s anything real happening, those would be the spots to look.

But it’s important to note that these are very, very small datasets as statistical samples go. Throwing out four years from Bonds’s career is better than 15% of the total data, and he’s got the longest career of any of these guys. So it’s very dicey to start talking about breaking out still smaller subsets of these. And there’s not a whole lot you can point to, in an objective mathematical sense, to suggest that any of these guys have really seen significant changes that would depart from the notion of an unchanging and innate “batting average” over the course of their whole career.

One additional thing you might look at to see if there’s change happening is to compare individual years not to a career-long total average, but some sort of cumulative average up to that point in their career (that is, total career hits divided by total career at-bats for the years prior to the one being measured). Those data look like this:

There are two things to look for here: one is whether it does a better job of matching the data generally, which might suggest that it’s tracking a slow but real change that isn’t captured by a career-long average. The other is that it might make discontinuous changes departing from a long-term trend stand out more.

Again, this is pretty equivocal. The one dramatic effect this has is to do a better job matching Bonds’s first few years (because they dominate the cumulative average during that span), but this actually makes his steroid years look slightly worse. The others don’t really change much (and really, any counting of discontinuous changes ought to ignore the first few years, where the cumulative average is pretty bad, but then you’re into the too-little-data problem again).

So, what’s the take-home from this? Well, that baseball is pretty random, at least at the really rudimentary statistical level used here. It’s very hard to rule out a model where players really have a single innate “batting average” that they keep for their whole career in favor of a model where that innate average changes over time.

The other thing to note here is that this is a valuable reminder that statistical methods demand *really big* datasets. Even baseball, with its interminably long season of games that drag on forever, doesn’t really produce enough statistics to have all that much confidence in the averages– these are guys with long and distinguished careers, but only two of them (Jeter and Ripken) have managed 10,000 at-bats. Even on that scale, you have an uncertainty of around 0.005 in their career-long batting average. That uncertainty is much larger for shorter spans– over 0.020 for a single season– and that means that it’s just not possible to statistically detect real changes from season to season, let alone within the course of a single season. No matter how much we might like to use those random numbers to construct narratives of change.