The WSJ discovers the unreliability of wine critics, citing the fascinating statistical work of Robert Hodgson:
In his first study, each year, for four years, Mr. Hodgson served actual panels of California State Fair Wine Competition judges–some 70 judges each year–about 100 wines over a two-day period. He employed the same blind tasting process as the actual competition. In Mr. Hodgson’s study, however, every wine was presented to each judge three different times, each time drawn from the same bottle.
The results astonished Mr. Hodgson. The judges’ wine ratings typically varied by ±4 points on a standard ratings scale running from 80 to 100. A wine rated 91 on one tasting would often be rated an 87 or 95 on the next. Some of the judges did much worse, and only about one in 10 regularly rated the same wine within a range of ±2 points.
Mr. Hodgson also found that the judges whose ratings were most consistent in any given year landed in the middle of the pack in other years, suggesting that their consistent performance that year had simply been due to chance.
It’s easy to pick on wine critics, as I certainly have in the past. Wine is a complex and intoxicating substance, and the tongue is a crude sensory muscle. While I’ve argued that the consistent inconsistency of oenophiles teaches us something interesting about the mind – expectations warp reality – they are merely part of a larger category of experts vastly overselling their predictive powers.
Look, for instance, at mutual fund managers. They take absurdly huge fees from our retirement savings, but the vast majority of mutual funds in any given year will underperform the S&P 500 and other passive benchmarks. (Between 1982 and 2003, there have only been three years in which more than 50 percent of mutual funds beat the market.) Even those funds that do manage to outperform the market rarely do so for long. Their models work haphazardly; their success is inconsistent.
Or look at political experts. In the early 1980s, Philip Tetlock at UC Berkeley picked two hundred and eighty-four people who made their living “commenting or offering advice on political and economic trends” and began asking them to make predictions about future events. He had a long list of pertinent questions. Would George Bush be re-elected? Would there be a peaceful end to apartheid in South Africa? Would Quebec secede from Canada? Would the dot-com bubble burst? In each case, the pundits were asked to rate the probability of several possible outcomes. Tetlock then interrogated the pundits about their thought process, so that he could better understand how they made up their minds. By the end of the study, Tetlock had quantified 82,361 different predictions.
After Tetlock tallied up the data, the predictive failures of the pundits became obvious. Although they were paid for their keen insights into world affairs, they tended to perform worse than random chance. Most of Tetlock’s questions had three possible answers; the pundits, on average, selected the right answer less than 33 percent of the time. In other words, a dart-throwing chimp would have beaten the vast majority of professionals. Tetlock also found that the most famous pundits in Tetlock’s study tended to be the least accurate, consistently churning out overblown and overconfident forecasts. Eminence was a handicap.
But here’s the worst part: even terrible expert advice can reliably tamp down activity in brain regions (like the anterior cingulate cortex) that are supposed to monitor mistakes and errors. It’s as if the brain is intimidated by credentials, bullied by bravado. The perverse result is that we fail to skeptically check the very people making mistakes with our money. I think one of the core challenges in fixing our economy is to make sure we design incentive systems to reward real expertise, and not faux-experts with no track record of success. We need to fund scientists, not mutual fund managers.