Peter Freed wants to you to know that Jonah Lehrer is Not a Neuroscientist. Lehrer doesn’t claim to be, of course. He’s a journalist and science writer who covers developments in neuroscience, and a good one at that.
Freed is concerned about how Lehrer handled a recent study on “the wisdom of crowds” in a recent op-ed in the Wall Street Journal. The wisdom of crowds is a long-standing and often-successful idea that you’ll get a better prediction by aggregating the responses from a bunch of people posed with the same question than you’d get by simply asking a given expert, or even aggregating the opinions of lots of experts. The research Lehrer was pointing out showed that the wisdom of the crowd declines when people have a chance to share information about what other people were going to predict. The more people learn about what other people think on the issue, the less diversity of opinion remains, and the more confident people become in their increasingly inaccurate predictions. As the researchers observe, this sounds kinda familiar to anyone reading the financial pages, or political news, or any of a range of fanatical subpopulations (birthers, truthers, creationists, to name a few).
Freed’s essay is not so much about Lehrer, actually, as about how scientists read. First, how and why a particular phrase in Lehrer’s essay stopped him, but then how one reads a scientific paper:
I opened the paper and did what I always do – skip the intro, go straight to the tables and figures, and then to the methods. If you ever read a science paper, you should do the same thing yourself. Reading intros and conclusions first is for suckers – they can say anything the author wants, and reading them allows the author’s “spin” – as we scientists call intros and conclusions – to frame your analysis of the data. In science, the only thing that matters is the methods and the data, because it’s where the author can’t hide behind a spun story.
This is exactly right. It’s not something you’re taught in a lecture in your science classes. It’s one of many quirky bits of scientific culture that’s passed along by oral tradition, the sort of thing students learn on their own or fall through the cracks in science education. It’s a bizarre way to read any document, but it’s what scientists do. We pound the structure of scientific papers into our students when we teach them to write lab reports, and then years later we let them in on a secret: that they should read the paper out of order.
About halfway through Freed’s essay, he realizes he’d assumed Lehrer was talking about a median rather than a mean, and thus was misreading Lehrer and not properly matching Lehrer’s essay with the paper he was citing. It should be clearly stated that Lehrer only referred to the median, so the fault lies with the reader here, and Freed doesn’t blame Lehrer for it. But then, alas, he just keeps on not understanding why Lehrer used the median, and along the way misrepresents what a median is, what a mean is, and why they are different.
For instance, Freed writes:
It had never occurred to me – as it never occurred tot he study’s authors – that Lehrer would be talking about an actual group median…. maybe he didn’t know what median meant. Because median guesses are not guesses by a crowd, as Lehrer states. They are guesses by a single person. … Which is to say, they have nothing to do with the point of the Journal‘s article [the wisdom of crowds]! …the median person is defined as the person with the middle value in a group – half the group is above them, and half below. Which means that the median person is one guy. Not a crowd! A single person! Which is why the numbers are so pretty – they were chosen by individuals. …the authors probably just included it [the median] for kicks, and not as a measure of crowd wisdom – which made sense, given it wasn’t a crowd answer. … And then I read the methods section, which confirmed my reading in spades. There I found this sentence: “this confirms that the geometric mean…. is an accurate measure of the wisdom of crowds for our data.” … The authors explicitly say that it is the geometric mean that should be used to judge crowd wisdom. Not the median!
All of this is wrong. Well almost all. It’s true that a median is the value which has as many values above it as below. It’s wrong, though, to jump from the median answer to the median person. And to suggest that a median value doesn’t represent the crowd is absurd. It’s definition requires us to consider the value of every other response, not just one response.
Nor is the median an inappropriate measure for this purpose. As the authors make clear in the second sentence of the paper – in the abstract of the paper no less - “Already Galton (1907) found evidence that the median estimate of a group can be more accurate than estimates of experts.” There are all sorts of good reasons to prefer the median as a measure. Indeed, that italicized quotation above is from a section of the paper justifying their use of the geometric mean instead of the more widely used median. As they explain in the Methods section:
a high wisdom-of-crowd indicator implies that the truth is close to the median. Thus, it implicitly deﬁnes the median as the appropriate measure of aggregation. In our empirical case this is not in conﬂict with the choice of the geometric mean as can be seen by the similarity of the geometric mean and the median in Table 1 [the Table Freed is focused on]. A theoretical reason is that the geometric mean and the median coincide for a log-normal distribution.
In other words, the median is the appropriate measure of the crowd’s choice, but for various computational reasons specific to this study, it was better to use a geometric mean.
Freed doubles down on this mishandling of statistics when he writes:
“arithmetric mean,” which is just a fancy-pants way of saying “the number that the students actually guessed.” That is, this is the number that real people literally wrote down when guessing the answer to the question.
No. No no no no no. No one in the study actually wrote down that there were 26,773 immigrants to Zurich. Someone did write down that 10,000 people immigrated to Zurich, and if you recall that was why Freed didn’t want to use the median. Indeed, exactly as many people wrote down a number larger than 10,000 as wrote a smaller number, while far more wrote a number smaller than 26,773 than wrote down a larger number. The suggestion that the larger number is more representative of what people wrote down boggles the mind and defies basic mathematics.
Freed here is trying to say that the arithmetic mean is a superior measure of the collective judgment than the geometric mean, which in turn is superior to the median, and he’s criticizing Lehrer for taking the opposite view. This, it must be emphasized, despite the fact that the authors of the paper under discussion agree with Lehrer and disagree entirely with Freed.
And they should disagree with Freed. The arithmetic mean is not the relevant statistic here.
To understand why, let’s quickly review what these terms mean.
The arithmetic mean is what you calculate when someone tells you to calculate the average. Add the numbers up and divide by the number of things you added. For various mathematical reasons, it is often the case for the sorts of numbers you and I deal with that the arithmetic mean is a good summary of the location of a given collection of numbers. It works wonderfully, for instance, with respect to people’s heights. It works there because height is distributed fairly symmetrically around its central point, as can be seen in the photo here, which shows the heights of shows genetics students of at the University of Connecticut in 1996. One has to squint a bit to see past the variability in the data, but larger samples bear out the observation that height is distributed in the shape of a standard bell curve, known as the normal distribution.
But this curve, and thus the arithmetic mean, is not normal in all cases. It doesn’t work, for instance, for wealth distribution. With an arithmetic mean, if Bill Gates and a homeless person walk into a bar, the arithmetic mean of the wealth of each person in the bar will skyrocket because $56 billion divided by any number of bar patrons is still enormous. But that doesn’t tell us as much as we want to know about the actual income distribution. Similarly, simply summing the househould income of every US household and dividing by the number of households gave you $60,528 in 2006 (according to Wikipedia). But in the chart shown here, you can see that most Americans make a lot less than that. Indeed, about 63% of households earned less than that. To get a better measure of the actual dynamics, you can do one of two things.
First, you can compute a different sort of average by multiplying all the numbers, and then taking the nth root of the result (where n is the number of quantities being averaged). Doing this tends to de-emphasize the effect of large numbers, and is thus going to be smaller (or the same size as) the arithmetic mean. It’s also more complicated to compute on the fly, so is less commonly used.
The median is easier to compute, because it corresponds to a value greater than half and less than half of all the other numbers. The median household income is $44,389, over $17,000 less than the arithmetic mean (and quite close to the geometric mean). For many purposes, this is a far preferable metric, especially when you’re more interested in where people stand than in what the numbers are (the arithmetic mean is what you’d get if you took away everyone’s money and divided it into equal piles, the median corresponds to what most people actually earn). Because income distribution is skewed (as seen in the figure above), an arithmetic mean is less appropriate. We want a method like a geometric mean or a median that better balances the impact of very large numbers.
There is no perfect measure to use, and to the extend Freed is treating the arithmetic mean as “the number that the students actually guessed,” he’s misleading readers. More than half of the students guessed a smaller number than the arithmetic mean, and treating that number as the gold standard devalues those students in preference for the ones who made insanely large guesses.
The median has always been the preferred measure for these sorts of “wisdom of the crowds” measures, and because the geometric mean best matched the median, that was the statistic the researchers chose to work with. And because the median is the number normally used in these sorts of studies, that’s what Lehrer reported.
Peters dismisses all of this by saying, essentially, that math is scary. The geometric mean is simply “something confusing involving logarithms,” so ignore math and theory and stick with an inappropriate statistic. (“Involving logarithms” because the logarithms transform the computationally obnoxious multiplications and nth roots described above into simpler additions and divisions followed by an exponentiation).
So, Freed spends 5500 words to convince us “we need not to get spun. Which is what surely happened to Lehrer.” And he wants us to believe that Lehrer got spun into using the median because the authors used the geometric mean rather than the arithmetic mean. Even though Freed cannot seem to understand what a geometric mean is, let alone why it really is preferable here, he thinks it’s Lehrer who must’ve gotten it wrong. Freed spun himself.
Freed’s broader points - about the importance of reading papers and news reports critically, of thinking carefully about the underlying assumptions, of checking numbers to make sure they jibe with logic, and to ensure that examples aren’t being cherry-picked - those are all great. But they’d be that much stronger if the substantive issue he’s raising wasn’t a misreading of Lehrer, the original paper, and basic statistics.