The Cognitive Psychology of Baseball!

Ah, yes, a real game (kidding, Scrabble people). If you've watched many baseball games or baseball movies, you know that one of the things that makes for a successful hitter is the ability to predict what the next pitch will be. Is it going to be inside or outside? Will it be a fastball or a breaking ball? If you're expecting a fastball and get a slow, breaking curveball, it's unlikely you'll get anywhere near it. So cognitive processing is an important part of being a good hitter. At least, that's what a hitting coach would tell you. And according to a 2002 paper by Rob Gray in Psychological Science, they'd be right.

Basically, Gray had college baseball players stand in front of a screen with a simulated baseball diamond, and swing at a simulated pitch. This setup led to one of the coolest method sections ever, if you're a baseball fan and a geek (like me):

Mounted on the end of the bat (Louisville Slugger Tee Ball bat; 63.5 cm long) was a sensor from a Fastrak (Polhemus, Colchester, Vermont) position tracker. The x, y, z position of the end of the bat was recorded at a rate of 120 Hz.

The pitch simulation was based on that used by Bahill and Karnavas (1993). Balls were launched horizontally (i.e., 0° ) from a simulated distance of 59.5 ft (18.5 m); that is, the pitcher released the ball 1 ft in front of the pitching rubber. The only force affecting the flight of the ball was gravity. The height of the simulated pitch at time t, Z(t), was changed according to

Z(t) = -1/2 * g * t2,

where g is the acceleration of gravity, 32 ft/s (9.8 m/s).

I know you're supposed to include descriptions of your equipment in method sections, but I can't get over the inclusion of the fact that the bat was a Louisville Slugger. I'm sorry, I'm a baseball fan.

Anyway, Gray included two kinds of pitches: slow and fast. The fast pitches were simulated at 85 +/- 1.5 mph, and the slow ones at 70 +/- 1.5 mph. Whether the pitch was fast or slow depended on the pitch count. For 0-0, 1-0, 0-1, 1-1, 2-1, 2-2, and 3-2 counts (that's balls-strikes, for those of you who don't know baseball), the probabilities for fast and slow pitches were .50-50. For pitcher's counts (0-2 and 1-2), the slow balls were more likely (0.65), and for hitters counts (2-0, 3-0, and 3-1), fast pitches were more likely (0.65). The pitch count was displayed on the screen so the hitters could keep track. There were three different horizontal positions for the pitches: strike, outside ball and inside ball. The strikes crossed the plate at 0 +/- 1 inch from the center of the plate, the outside balls at 12 +/- 1 inch away from the center of the plate (in the direction away from the batter, that is), and the inside balls at 12 +/- 1 inch from the center in the direction towards the batter. Whether the pitch was a ball or a strike was randomly chosen. Each hitter took 25 swings per block for 10 blocks, with rest (a lot, I hope, 'cause 250 swings is crazy) in between.

So here's what the hitters had to predict: whether the ball would be fast or slow, and whether it would be inside, outside, or down the middle. Since certain types of pitches (e.g., slow breaking balls) are associated with pitcher's counts, and others (fastballs, mostly) are associated with hitter's counts. Since Gray used these associations to determine pitch probabilities, the batters had some basis for predicting pitch speeds. Since pitch locations were random, the batters just had to guess these.

In order to get a measure of the batters' accuracy, Gray assumed that they'd be trying to hit the ball at a given position (0.9 meters in front of the plate) and bat height (the lowest position during the swing), which he got from previous research on baseball hitters. He then took a measure of temporal error, which he calculated by subtracting the time at which the bat reached its lowest position and the time the ball was at 0.9 meters in front of the plate. Gray's prediction was that if the batters predicted the right pitch, their temporal error would be lower than if they predicted the wrong one.

To test this prediction, Gray developed a "finite-state Markov model" to simulate batting performance with predicted pitches. I won't get into the details of the model (you can read the paper, linked above, if you're really interested). Basically, the model predicted the next pitch based on its predictions for the previous pitches and the accuracy of those predictions, along with knowledge of pitch count-pitch speed associations. If the model's performance was similar to that of the real batters, then this would provide evidence for the benefit of correctly predicting a pitch. Here's a graph comparing the model's performance to one of the batter's (from Gray's Figure 2, p. 544):

i-1536ab2a03fd47b9dec3ff63e4372941-Gray2002Fig2.jpg

The graph presents temporal error scores for the model and the batter for seven different pitch counts. The important thing to notice is that despite some small difference in absolute numbers, the patterns for the model and batter are highly similar. As you would expect, given the pitch count-pitch speed associations described above, early counts that don't overly favor either the pitcher or the batter (0-1, 1-0, and 1-1) produce high errors, because it's more difficult to predict the next pitch (recall that for these counts, the probability of a fast or slow pitch was 0.5-0.5). For the hitter's counts (0-2 and 0-3), temporal errors were relatively small for both the model and the batter, reflecting the fact that they could accurately predict fast pitches on these counts. On the other hand, error scores were relatively high for pitcher's counts (0-2 and 1-2), despite the fact that these are associated with slow pitches. This reflects the fact that batters generally don't do well on pitcher's counts (you have to swing at anything close, whether you predicted correctly or not).

So there you have it, the strong correlation between model and batter performance suggests that predicting the next pitch correctly really is important for hitting successfully. Hitting coaches and color commentators have been right all this time. Of course, there's a massive duh factor to this, but it's still cool to see it confirmed in a laboratory environment.

Gray, R. (2002). "Markov at the Bat": A model of cognitive processing in baseball batters. Psychological Science, 13, 543-548.

More like this

Matsuzaka looked impressive in his MLB debut. He had 10 strikeouts in 7 innings and only threw 108 pitches. I'm still not convinced he's worth $103.1 million, but the weak Kansas City lineup looked pretty dazed and confused. Matsuzaka's genius, I think, is to create as much batter uncertainty as…
The Times has an interesting profile of Johan Santana, perhaps the most effective pitcher in baseball. What's interesting about Santana is that his secret isn't a 98 mph fastball or some wicked new breaking ball. Rather, he strikes out batters because he denies batters the perceptual cues they rely…
A few months ago, I posted about a study showing implicit racial bias in NBA referees' calls. Now it's baseball's turn, because yesterday reports of study by Parsons et al.1 that shows analogous results for home plate umpires began popping up all over the media. The study is pretty…
Found some Koufax footage. About halfway through this short clip he Ks Mantle, looking, and a bit later, in the dark footage toward the end, is a good strip of him throwing the devastating curve. Note there the emphatic downward motion of his shoulder -- which brought down his hand the faster,…

Two great baseball players engaged once, it is said, in a very sophisticated Zero Sum Game of Mathematical Disinformation Theory.

The story is well-know; the analysis by Jonathan Vos Post is original.

Yogi Berra [catcher] to Hank Aaron [batter]:
"The label's on top."
[Translation: it is widely beleieved that the location of the label of the bat with respect to the bat-ball impact point affects the probability that the bat will break on contact, which is negatively correlated with the probability of a home run, due to momentum conservation considerations; hence I offer to you that you drop your model of this at-bat and replace it by one with an additional variable to take into account, as I hope that you will, I expect you to decline in performance by the replacement cost of
model-switching]

Hank Aaron [batter] vs Yogi Berra [catcher]:
"I didn't come here to hit and read at the same time."
[Translation: I'm on my way to the record number of home runs hit in a lifetime, surpassing Babe Ruth's record, and likely to stand until at least 2007 with Barry Bonds, and then Alex Rodriguez perhaps 12 to 14 years later; I decline to make my analysis more complex, as I assert that I measure myself as being on the manifold rather near the global maximum of my performance in the zero-sum game between pitcher and batter, and believe that I would do worse if I changed my eigenvector in the predicted fast-ball/ curve-ball/ slider probability distribution; while you are destined to be known in 2007 primarily through a cryptic ad for Aflac, hence do not divert me from my optimal allocation of resorces].

The branch-and-bound Decision Analysis corollary by Yogi Berra, at another time:

"When you come to a fork in the road, take it!"

Baseball. America's Game. The America of John Forbes
Nash, Jr., anyway.

-- Prof. Jonathan Vos Post