Is there really wisdom in crowds?

Here's an interesting article about the wisdom of crowds. It starts by discussing the surprising accuracy of Wikipedia.

The reason that Wikipedia is as good as it is (and the reason that living organisms are as sophisticated as they are), is not due to the average quality of the edits (or mutations). Instead, it is due to a much harder to observe process: selection. Some edits survive, while others quickly die. While one can look at the history of a Wikipedia article and see each and every edit, it is much harder to tell how many potential editors looked at an article, subconsciously thought "I doubt I could improve this much," and chose not to try. Each of these can be considered a "selection event", and the number of such events vastly outnumbers the actual edits. Selection is the heart of what makes Wikipedia -- as well as Darwinian evolution -- work.

Several friends of mine dispute this. They say that while Wikipedia is fine for basic factual information you might find in a newspaper, when you get to the level of serious academic research, the information quality breaks down. A physicist friend says his students are constantly getting misinformation about physics from Wikipedia. Another friend, a historian specializing in the Middle East, says Wikipedia is rife with errors.

Next, the article goes on to discuss "prediction markets." These include everything from commodities markets like pork bellies and corn futures, to betting pools on who'll win the presidential race.

Imagine that lots of random people come in and make bad guesses at who will win the election. The price of the contracts will then vary significantly from what the best expert would predict, resulting in an unstable (i.e. non-equilibrium) situation. Now all it takes to make some easy money is to consult with such an expert and buy the contracts whose prices are the furthest from the experts' estimates. If it is indeed this easy to make money, the market will attract lots of people, including institutional investors who have the ability to invest enough to quickly move the price back to where the experts predict. Meanwhile, those experts who consistently predict badly will tend to eventually pick another line of work which they are better at, while those who are best at picking will make lots of money doing so, and will therefore tend to be there with cash in hand whenever the prices stray far from their predictions. Each expert tends to gravitate toward the specific things that they might have special expertise (or inside information!) on and therefore has the best chance of out-predicting the other experts. Over time, it becomes harder and harder to consistently outguess the market, no matter how good you are.

This makes a little more sense than the Wikipedia argument. If a Middle-East historian could make a fortune by correcting errors in Wikipedia, I suspect it would be much more accurate than it is. The problem with futures markets is that they are only accurate to the extent of our current knowledge. Guiliani might now have a 15 percent chance of getting elected today, but if tomorrow's newspapers carry stories that Guiliani accepted millions in bribes from Al Qaeda, those chances will plummet.

The third example, I think, is the most promising of all: movie recommendations. Netflix offered a $1 million prize to the programmer who could improve its computerized movie recommendation system. Here's one potential solution to the problem:

If a movie is near a user -- in the same neighborhood, so to speak -- it can be predicted that that user will probably like that movie, even if the user did not specifically rate it. Movies that are universally liked tended to move toward the center of the model ("Shawshank Redemption" being closest to center), disliked movies moved toward the outside. In practice, I found that giving 12 or so dimensions, rather than just 3, worked a lot better, allowing a much richer categorization, and allowing each neighborhood to be adjacent to a great many other neighborhoods. There are several other layers of complexity in order to get the best results, but the gist of the approach is just as simple as described.

The difference between this system and the other two systems is that the people providing the ratings have more information: they've seen the movie. The only unknown factor is the tastes of the person they are providing ratings for. But this person also provides their own ratings, so we also know a great deal about them.

I've recently resubscribed to Netflix, and I have to say, the system has definitely improved over the years. The major problem now is that it doesn't "know" all the movies I've seen, so many of its recommendations are useless. The other big problem is that choosing a movie to watch can often be more complicated. Sometimes I'm looking for a movie to watch with Greta or the kids, who have different tastes from me -- we need to pick a film that all of us will enjoy.

But despite its limitations, the whole article is definitely worth a read. Check it out.

Tags

More like this

Several friends of mine dispute this. They say that while Wikipedia is fine for basic factual information you might find in a newspaper, when you get to the level of serious academic research, the information quality breaks down. A physicist friend says his students are constantly getting misinformation about physics from Wikipedia. Another friend, a historian specializing in the Middle East, says Wikipedia is rife with errors.

This is very true. It's particularly true in scientific areas where there are large, vocal, pseudoscientific activists. Autism is one area. Vaccination is another. Evolution, too. This results in "edit" wars, with activists trying to push their pseudoscience. In fact, this "selection" in Wikipedia actually can work against accuracy, because the "selective forces" (I.e., editors altering or correcting what they think to be incorrect or poorly stated information) tend to favor the cranks (creationists, quacks, Holocaust deniers, 9/11 conspiracy theorists, etc.), who tend to have a lot more time and passion to edit and create Wikipedia articles than those who would remove their dubious information have to correct them. An inherent admission of this problem comes in the form of how many Wikipedia articles tend to have moratoriums on new edits in these topics.

Is it true that 'The Shawshank Redemption', despite being on many people top ten favorite list now, was not a big success at the box office when it was released?

I'd be interested to know what kinds of physics errors the students are getting from Wikipedia, and as for the historian, does he know he can correct the errors?

I think Wikipedia is a wonderful tool. I use it nearly on a daily basis. I always take the information with a grain of salt, and double-check if it's reasonably important. And it would be silly to cite it as a primary source. But it has far more information, and accurate information, than any other repository of its kind. And for research, one of the more valuable aspects is the ability to follow up on references to secondary sources, which most science or technical articles are very good about listing.

I'd be interested to know what kinds of physics errors the students are getting from Wikipedia, and as for the historian, does he know he can correct the errors?

These are busy people, with grants and book contracts. Why would they waste their time correcting something that will likely be "corrected" back by someone with fewer qualifications but more time than them?

Instead of complaining about errors in Wikipedia, teachers should use those errors as a teaching tool, especially the scientists. As to the historians, as they say, the victors write the history.

If links are allowed, here's a nice discussion where a physicist highlights the reasons why Wikipedia isn't useful for serious academic purposes.

I'm not sure how the prediction markets could possibly work when they are predicting something like who will win the next election. The problem is that there is no feedback about which predictions are accurate and which are not until election day. So I don't see how the self-correction aspect comes in.

I think Wikipedia works pretty well when it is only discussing the dry technical details of some theory, such as the lambda calculus, because only those who actually know something about it will tend to contribute. It works less well when the subject is controversial.

The key is this: if only experts are interested in a topic, then they will be the only ones to contribute. But if the topic is of interest to those who know nothing about it, that's a situation ripe for bad Wikipedia articles. Maybe that could be the basis for an accuracy rating on articles?

Many years ago, I heard the Canadian Science journalist David Suzuki argue that the public, not the scientists, should determine the course of science.

Those who think that crowds are always right know little about the history of vaccination. Edward Jenner was mocked by the media and the streets (see link above). His vaccination against small pox (the single most important medical discovery) survived only because he and other scientists convinced Queen Victoria to vaccinate her children.

To this day, physicians are fighting a deep, visceral fear of vaccination. If the issue of compulsory childhood vaccines was put to a referendum, it would lose badly.

How many WalMart shoppers would approve the construction of a new particle accelerator or the sequencing of an insect genome? How many would approve research to support Creationism? Enough said.

The worst part of Wikipedia's physics coverage, in my experience, has been the introductory stuff. The really knowledgeable people aren't interested in writing material at the high-school level, and it's easier to write about some facet of advanced mathematics than it is to organize a useful presentation of a topic like "force" or "energy".

I suspect that this may be contributing to the woes of your "physicist friend" and his misinformed students.

While "Wikipedians" are undoubtedly quite good in correcting or removing erroneous information, two problems will continue to exist:
(1) Large groups of users can easily bombard a single page or subject area with edits such as that page-watchers can no longer keep up in doing the necessary revisions and/or corrections. Editing the Wikipedia article on elephants is still locked for new and unregistered users as a result of Stephen Colbert's "prank," which was over one year ago.
(2) Vandals will continue to deface articles. Although such acts typically survive for only a few hours (or minutes, or days, given the popularity of said article), the moments in time wherein erroneous information exists can potentially misinform anyone doing basic research on the topic. I've removed a few bits of vandalism from several articles; at times it's blatantly obvious (like the things I'd found on the article for Toilet), other times not so much. What frightens me is that high school students (or any student, for that matter) may rely solely on Wikipedia for, say, their history reports - what happens if at the time of their research, someone had just recently changed all the dates and names? If you're relying solely on Wikipedia for your facts without double checking elsewhere, odds are you wouldn't notice if an event were a century or two off, or question the "fact" that World War II was based in San Francisco, or... etc.

Quite an interesting topic...

I remember seeing a panel discussion on CSPAN (maybe hosted by AEI) where they were discussing this topic. Most of the people were 'crowds/markets are wonderbar' types, as one would expect from free-market-uberalis folks. However, one person (who I really wish I could remember the name of) was doing serious academic work on different sorts of 'wisdom of crowds' systems and identifying conditions when they work well and when they don't. Polls, voting, predictive markets, and such have different properties, but all suffer from some common pathologies.

If the participants are generally uniformed, polls suck of course. Predictive markets correct for that a bit, where knowing ignorance leads people to not participate, but also leads to much smaller pools of participants (which is a problem too).

The worst, and probably most common, problem is misinformation. This should be obvious. What is less obvious is that a feedback loop can form where people reinforce (and assign more confidence to) their misguided but common notion making the entire system go way off the rails. The correction mechanism for this (when it exists at all) is quite delayed and leads to bubble/burst cycles. This is a very bad property to have in some cases, and sometimes stability is more important than some possibly more 'optimal' state.

BTW: In response to #7, for prediction markets the selection event occurs when a person decides to participate or not (and how much). This intrinsically includes that participants past performance (if there is any) as well as their confidence level. If these sorts of markets confuse you, I hate to think of your grasp on biological evolution.

travc writes:

In response to #7, for prediction markets the selection event occurs when a person decides to participate or not (and how much).

If what you're interested in is knowing the degree of belief in some claim, then that gives you a good indication. Those who believe strongly are more likely to bet heavily than those who are just playing a hunch. However, the strength of someone's beliefs doesn't necessarily imply anything about the accuracy of those beliefs.

This intrinsically includes that participants past performance (if there is any) as well as their confidence level.

Yes, if there is plenty of past performance to base your decisions on, then it will be self-correcting. That was the point I was making. I would think that there needs to be plenty of feedback on the correctness of decisions for this sort of thing to work.

If these sorts of markets confuse you, I hate to think of your grasp on biological evolution.

One exchange with you, and you have to act like a jerk. Did you have a bad day, or something?

I don't see how it is comparable to biological evolution, precisely because natural selection is constantly self-correcting through bad genes dying off and good genes reproducing. If biological evolution had to rely on other animals' opinions about who is fit and who isn't, I doubt that it work very well.

The naysayers seem to want Wikipedia to always be right. Nothing published has ever been 100% correct. I use Wikipedia frequently. It is great for getting general information. Anyone who would use it as a final source is an idiot. Sorry you physicists feel to important to help others with general information. The chemistry (really good from most I have used) is usually right on and most biology is really good as well until you get the topics the wingnuts don't like and try to spam. Wikipedia has good rules to stop most of that. I was unaware of the Colbert elephant spam, a rule change to Wiki could probably fix problems like that.

My one Wikipedia (India Ink, the paragraph on its use in microbiology) edit has been up for 6 months with a few edits in the first few days and no changes since. After reading it again I might just change it a little but I am in no rush.

If you use Wikipedia as what is, a great and easy to access source of general information it is one of the best (and usually reliable) tools on the internet. If one is doing serious research it should not be a primary source and any information obtained should be verified if possible by other sources. But if you want some basic info on something like methane real fast there is nothing to beat it.

Great post Brian (13). It was not that long really, please reconsider you decision to not post here please.

From the list of ten thing one should know about Wikapedia (their own list).

6). We do not expect you to trust us.

It is in the nature of an ever-changing work like Wikipedia that, while some articles are of the highest quality of scholarship, others are admittedly complete rubbish. We are fully aware of this. We work hard to keep the ratio of the greatest to the worst as high as possible, of course, and to find helpful ways to tell you in what state an article currently is. Even at its best, Wikipedia is an encyclopedia, with all the limitations that entails. It is not a primary source. We ask you not to criticize Wikipedia indiscriminately for its content model but to use it with an informed understanding of what it is and what it isn't. Also, because some articles may contain errors, please do not use Wikipedia to make critical decisions.

I think many specialists focus on Wikipedia as a source of errors, but that's partly a matter of convenience. Students also run across errors in textbooks, popular magazines, etc., but those aren't a single source and are thus far more difficult to demonize. (I've seen some real howlers in textbooks--mistakes that have survived multiple editions.)

Personally, as an English teacher, I think Wikipedia is an excellent way to demonstrate the way information consumers should read and think about *every* piece of information. First, introduce the process behind Wikipedia and show several discussion and edit pages. Discuss what kind of similar, if much less accessible and transparent, processes occur before traditional media is published. Then encourage students to look at and follow Wikipedia articles' citations and references, and use those sources once they've confirmed the authority of the sources. Once they get used to checking citations on Wikipedia, they start to wonder--where are the citations in my daily newspaper? My health magazine? My history textbook? (Undergrad textbooks rarely have internal references.) Far from lowering standards, I think that informed Wikipedia usage can result in internalization of a demand for accuracy.

I think it depends very much on what you are using wikipedia for. On technical and scientific issues, specialized encyclopedias are better. However, where Wikipedia succeeds is on covering those topics that are considered too trivial or ephemeral for other encyclopedias. For example, if I need a synopsis or character list for media published in the last decade (including video and games), Wikipedia is usually a better starting place than sifting through published reviews.

By KirkJobSluder (not verified) on 20 Oct 2007 #permalink

I wonder how much the "wrong" information is really information people disagree about? People disagree about pretty much everything, how do you decide who is right in order to determine accuracy? Take some of those disagreements about middle east history and isolate them, then discuss them, and I'll bet people have reasons for what they wrote.

Hmm,

In connection with wanderer's comment, one possible answer to the post's headline might be the extent to which articles on Wiki are susceptible to groupthink...

http://en.wikipedia.org/wiki/Groupthink

By Tony Jeremiah (not verified) on 20 Oct 2007 #permalink

"To this day, physicians are fighting a deep, visceral fear of vaccination. If the issue of compulsory childhood vaccines was put to a referendum, it would lose badly."

I wouldn't go that far; vaccinations are required in every state.

But, I agree that science education is appallingly weak.Ask people about cloning or stem cells,or ethanol fuels, or global warming, and you'll get these "gut" reactions that pretty devoid of any actual study.

I'm not even sure markets are rational; look at the bursting of the '00 "dot com" bubble. Or interest-free ARM's today.