Teams Who Are Ahead Win More Frequently

Over at the New York Times' Freakonomics blog, Justin Wolfers gets into the March Madness spirit by reporting on a study of basketball games that yields the counter-intuitive result that being slightly behind at halftime makes a team more likely to win. It comes complete with a spiffy graph:

i-5ac58332d0545cfa0092abe7c190c0ab-Halfscore.jpg

Explained by Wolfers thusly:

The first dot (on the bottom left) shows that among those teams behind by 10 points at halftime, only 11.8 percent won; the next dot shows that those behind by 9 points won 13.9 percent, and so on. The line of best fit (the solid line) shows that raising your halftime lead by two points tends to be associated with about an 8 percentage-point increase in your chances of winning, and this is a pretty smooth relationship.

But notice what happens when we contrast teams that are one point behind at halftime with teams that are one point ahead: the chances of winning suddenly fall by 2.4 percentage points, instead of rising by 8 percentage points.

This has an explanation drawn from behavioral economics, which you can go read for yourself. Like all behavioral just-so stories, it seems really plausible. Plus, the trend in the data is really striking. I mean, just look at that graph!

However, I took the liberty of re-plotting their data:

i-b84b785d84da3b36eb21f2a518b8cb25-half_score_line.jpg

I reconstructed the data by the brute-force method of pixel counting in the GIMP, and then plugging the results into SigmaPlot. It's not quite perfect, but it's close enough for government work. Then I fit a straight line to the whole data set (slope of 0.0377, intercept 0.5157, R2=0.98398, other statistical measures available upon request).

And, funny enough, with the straight line there, the difference between leading by one and trailing by one doesn't look so dramatic, does it? Amazing how excluding the "tie score" point, doing a complicated polynomial fit, and extending it to the un-physical value of a half-point deficit guides the eye, no?

This is not to say that the original researchers Wolfers is drawing this from (Jonah Berger and Devin Pope) don't have a real point with their paper. Wolfers describes some laboratory tests of the supposed phenomenon that certainly sound more scientific (he also links to their full paper, which I don't have time to read, but knock yourself out).

The problem is, this sort of how-to-lie-with-graphical-presentation horseshit makes it much harder for me to take the whole thing seriously. And, by extension, makes me cast a more skeptical eye on the whole field of behavioral economics.

(A tip of the hat to Matthew Merzbacher on a mailing list, who pointed out the fit extension to half-a-point, thus triggering this post.)

More like this

I've been playing around with the spiffy sales rank tracker Matthew Beckler wrote, because I'm a great big dork, and enjoy playing with graphs. Here's a graph of the sales rank vs. time through 2pm EST today (plotted in Excel from the data table at the bottom of the page): As I noted in my…
When I saw the data generated by the sales rank tracker Matthew Beckler was kind enough to put together, I joked that I hoped to someday need a logarithmic scale to display the sales rank history of How to Teach Physics to Your Dog. Thanks to links from Boing Boing, John Scalzi, and Kevin Drum, I…
According to the Star Tribune: But the Star Tribune is wrong, and they know they are wrong (according to sources close in). So, is their front page editor on crack or something? Let's test this hypothesis. (UPDATE: See: Senate race polling breakdown) Available recent polling data suggest that…
One of my current thesis students has been plugging away for a while at the project described in the A Week in the Lab series last year, and he's recently been getting some pretty good data. I've spent a little time analyzing the preliminary results (to determine the best method for him to use on…

Clever, since your fit does suggest that the alleged signal is consistent with noise, but your fit has its own flaw.

Your fit assumes the date to the left of zero are independent of the data to the right of zero. They aren't.

The mirror symmetry is quite obvious in your fit. Any point above the trend on one side is matched by a point below the trend on the other. As it must be, because if more teams win when ahead by 3 points than should be expected, that automatically means more teams than expected lose when behind by 3 points.

What do you get if you do an unconstrained fit to only the data from 0 to 10, including 0?

By CCPhysicist (not verified) on 18 Mar 2009 #permalink

What do you get if you do an unconstrained fit to only the data from 0 to 10, including 0?

Slope of 0.0375, intercept 0.515, R-squared of 0.946. If I throw out the tied-at-halftime point (I can't think of any legitimate reason to do that, but they appear to have left it out of the top graph), the values are 0.03722, 0.5129, and 0.9285.

In other words, it makes no significant difference.

Wow. It looked kind of suspicious to me when you showed the first graph, but those curves really do "guide the eye." CCPhysicist raises a good point though. I was starting to wonder if the oscillations were statistically significant but once I realize there's only half as much independent data as it looks at first blush, the wiggles probably aren't significant.

Using the same logic, it's clear that if it's almost half time and you're losing by 3 points, your best bet is to let the other team score one free throw. Trailing by 4 at the half clearly gives you a significantly [sic] better chance of winning than if you trail by 3.

/sarcasm

Since the data has the built in symmetry, you should be able to improve your gimp-derived reconstruction by averaging the come-from-behind wins with one minus the ahead at halftime losses. That should also fix your 0 point which mathematically has to be 50% (in games where the score was tied at halftime, exactly 50% of the teams in the game won) and fix your regressions so all the intercepts are 0.50.

Of course that is just cleanup -- the worry remains valid that the wiggle at +-1 point anomaly is just noise in the data.

Since the data has the built in symmetry, you should be able to improve your gimp-derived reconstruction by averaging the come-from-behind wins with one minus the ahead at halftime losses.

True enough.
Averaging the two halves and fitting everything including the zero point gives 0.0375, 0.5010, and 0.946-- again, basically no difference.

The one-point difference does have the largest magnitude residual of any of the points in the data set, but it's not wildly different from the others. By eye, you wouldn't say there was any significant deviation.

Since the points are symmetric the fit won't change. Well, R might be slightly inflated, but the slope and intercept should stay the same.

Skimming the paper, it looks like there's some major data-massaging going on. As noted by CCPhysicist, there are only ten data points here, mirrored, so a true fit should be 50% at the midpoint. They fit those ten points to a *quintic* polynomial, for some reason I couldn't understand. When the error bars are included, it's clear that the one-point value falls within 1 sigma of 50%. The raw data is significant at the 10% level, but by controlling for 5 parameters that, at least to me, don't seem to be the most characteristic of a game (possession arrow at 2nd half?), they've managed to get their results significant at the 1% level. I'm no statistician, but I'm doubtful, to say the least.

Skimming the paper, it looks like there's some major data-massaging going on. As noted by CCPhysicist, there are only ten data points here, mirrored, so a true fit should be 50% at the midpoint. They fit those ten points to a *quintic* polynomial, for some reason I couldn't understand. When the error bars are included, it's clear that the one-point value falls within 1 sigma of 50%. The raw data is significant at the 10% level, but by controlling for 5 parameters that, at least to me, don't seem to be the most characteristic of a game (possession arrow at 2nd half?), they've managed to get their results significant at the 1% level. I'm no statistician, but I'm doubtful, to say the least.

Can you post the raw data you're using from measuring the graph? I'd like to play a bit and seeing as you've already scraped the numbers...

Can you post the raw data you're using from measuring the graph? I'd like to play a bit and seeing as you've already scraped the numbers...

Probabilities based on pixel-counting, in order from -10 to 10: 0.1581, 0.1801, 0.2096, 0.2831, 0.2500, 0.3015, 0.3898, 0.3567, 0.4266, 0.5259, 0.5185, 0.5038, 0.5994, 0.6730, 0.6399, 0.7355, 0.7796, 0.7502, 0.8238, 0.8532, 0.8716

You might look at the PDF of their paper (linked from the Freakonomics post)-- for all I know, they may have the correct figures in there as a data table.

Hmmm... looking at the academic version of their paper, the error bars in figure 1b are weird. There should be zero error on the tied-at-halftime point, but they give an error bar the same as the other points. And all the error bars look suspiciously the same.

The only raw numbers I've found in the paper are percent win for -1 is 51.3% and the number of data points used is 6572 games altogether.

With a simple analysis of the win/loss data I estimate the down-by-one point win anomaly as a 1.6 sigma event. It is hardly unexpected to find such a moderate outlier in a data set of 10 points, and this is hardly distinguished from the 1.4 sigma down-by-three point loss anomaly. Yet they want to explain one by psychology and the other by chance!

I skimmed the academic paper; it looks to me like they are playing statistical shenanigans to get their claims of high significance. (Of course they are NOT just using the win/loss percent vs. points difference at half time, and they don't give enough details to really see what they ARE doing...)

Here's my details if anyone cares.
I started with Chad's pixel-count data symmetrized and with the -1/+1 point fixed to match the paper's 51.3/48.7%. I fitted a linear regression to the 21 point symmetric dataset from -10 to +10; I get slope 0.03765 and intercept 0.5 in good agreement with others.

The standard error from the linear regression isn't really the right variable, since the tie-game scenario is mathematically determined to have 50% wins. I suggest the one standard deviation level from the linear model should be calculated using the -10 to -1 difference data. That gives me sigma of 0.032.

data tables:

difference :-10-9-8-7-6-5-4-3-2-10
% win :0.1430.1630.1930.2660.2350.2830.3750.3420.4140.5130.500
regression :0.1230.1610.1990.2360.2740.3120.3490.3870.4250.4620.500
off linear :0.0200.002-0.0060.030-0.039-0.0290.026-0.045-0.0110.0510.000
std deviations :0.6240.073-0.1850.946-1.226-0.9060.806-1.425-0.3501.5970.000

To my eye, it seems that the data is a combination of two different waveforms:

- a straight line (like the second line of best fit)
- a sawtooth wave with maximum amplitude at 0 that attenuates moving outwards

It seems that simplistist "line-of-best-fit" approaches are a poor choice to model both these components.

Based on my interpretation, I would suggest that the sawtooth is more about the scoring multiples (2s and 3s + free throws) used in basketball than behavior or other factors.

My training is in digital signals (not statistics) so I feel a little out-of-place, terminologically.

Yeah, no way. To do a fair comparison, they need to penalize the model with all the extra curvature parameters. Adding parameters always gives you a better fit. Google "AIC" or "BIC" for the usual ways of doing this. I don't think there's a chance in the world that their new model fits any better than a straight linear regression.

There's also the dynamic that the standard scoring in basketball is a two-point shot (or the opportunity for two free throws), and one way to interpret the data is that being ahead or behind by a point is statistically insignificant from being tied. Tied or behind, and one shot puts you in the lead. Once the lead is more than a field goal you start seeing statistically significant differences in forcing teams make more than one extra shot, or to shoot lower-probability shots (threes), in order to make up the margin.

I seriously doubt the "we're up by one so we can relax" explanation proffered in the article is a viable explanation. The "you're a close second" motivation may work if the leader doesn't know the lead is small, and doesn't seem to address the possibility that winning may mean you are more reluctant to change your strategy, which is obviously working, whereas when you are behind, changing strategy has a lower risk/reward ratio.

Matt Gallagher -

The sawtooth wave is largely an artifact of their plotting the same data twice. If you look at just half of the data points (which is really all of the data) the sawtooth wave looks like noise.

Skimming the paper, it looks like there's some major data-massaging going on. As noted by CCPhysicist, there are only ten data points here, mirrored, so a true fit should be 50% at the midpoint. They fit those ten points to a *quintic* polynomial, for some reason I couldn't understand.

Wow.
A 5th-order fit to ten data points? You know, call me a curmudgeon if you like, but I think you really ought to have more than two points per adjustable parameter when you start fitting curves to data...

They fit those ten points to a *quintic* polynomial, for some reason I couldn't understand.

Sarah: the reason would be that the authors of the paper are up to no good. I have a hunch that some of the lower order polynomial fits might not show that trend--high order polynomials have been known to do crazy stiff things like that. Chad is not being a curmudgeon here; there is no justifiable a priori reason for thinking that the best fit should be a fifth-order polynomial.

By Eric Lund (not verified) on 19 Mar 2009 #permalink