Content Volatility of Scientific Topics in Wikipedia: Plowshare Prates (anag.)

Yet another "academic" article about wikipedia (Content Volatility of Scientific Topics in Wikipedia: A Cautionary Tale; Adam M. Wilson, Gene E. Likens):

Wikipedia has quickly become one of the most frequently accessed encyclopedic references, despite the ease with which content can be changed and the potential for ‘edit wars’ surrounding controversial topics. Little is known about how this potential for controversy affects the accuracy and stability of information on scientific topics, especially those with associated political controversy. Here we present an analysis of the Wikipedia edit histories for seven scientific articles and show that topics we consider politically but not scientifically “controversial” (such as evolution and global warming) experience more frequent edits with more words changed per day than pages we consider “noncontroversial” (such as the standard model in physics or heliocentrism). For example, over the period we analyzed, the global warming page was edited on average (geometric mean ±SD) 1.9±2.7 times resulting in 110.9±10.3 words changed per day, while the standard model in physics was only edited 0.2±1.4 times resulting in 9.4±5.0 words changed per day. The high rate of change observed in these pages makes it difficult for experts to monitor accuracy and contribute time-consuming corrections, to the possible detriment of scientific accuracy. As our society turns to Wikipedia as a primary source of scientific information, it is vital we read it critically and with the understanding that the content is dynamic and vulnerable to vandalism and other shenanigans.

The content of the paper is laughably thin. After a brief anecdotal introduction about blowjobs in 2011, we come to the brief substance of the paper:

We compared three topics we consider to be politically (though not scientifically) controversial (acid rain, global warming, and evolution) and four we consider to be politically uncontroversial (heliocentrism, general relativity, continental drift, and the standard model in physics)... We then calculated three metrics from each article’s history: 1) daily edit rate (excluding successive edits by the same user, n = 23,156), 2) mean edit size (the total number of words inserted, deleted, or changed) on days with at least one edit (n = 8,525), and 3) mean number of page ‘views’ per day (which includes requests by computer programs, only available after 2008-01-01)... The geometric means (±SD) ranged from 0.2±1.4 edits per day and 9.4±5.0 words changed per day for the standard model to 1.9±2.7 edits per day and 110.9±10.3 words changed per day for global warming... the three ‘controversial’ topics each had greater mean edit rates than each of the ‘noncontroversial’ topics (p

That's about it, really.

A better analysis

The above is largely useless, because the original assertion was The high rate of change observed in these pages makes it difficult for experts to monitor accuracy and contribute time-consuming corrections, although that really should have been phrased as a question. So, let's look at a few days of global warming's history. For 2015-08-14:

* BG19bot m . . (174,469 bytes) (-2) . . (WP:CHECKWIKI error fix. Syntax fixes. Do general fixes if a problem exists. - using AWB (11377))
* Isambard Kingdom . . (174,471 bytes) (-138) . . (Undid revision 675994710 by Jamalmunshi (talk) Self-serving link removed.)
* Jamalmunshi . . (174,609 bytes) (+138) . . (→External links)

So, that's easy to check: someone added a promo link and it got reverted; and a bot fixed something.

The previous day:

* Short Brigade Harvester Boris (talk | contribs) . . (174,471 bytes) (+46) . . (→Initial causes of temperature changes (external forcings): give Milankovitch his own section (albeit short), consistent with other external forcing)
* Short Brigade Harvester Boris (talk | contribs) . . (174,425 bytes) (-63) . . (→Initial causes of temperature changes (external forcings): tighten a little)

Those are both Boris, so you can trust them, and he's even explained them. There's nothing on the 12th, the 11th is Boris again (one of which is almost interesting, but its just pruning). Nothing on the 10th or the 9th, and on the 8th only more pruning by Boris. Nothing on the 7th, 6th or the 5th (are you starting to see a pattern here?). On the 4th there is vandalism which lasts 8 minutes.

We need to go back to the 25th of July for anything interesting; Jcardazzi (talk | contribs) . . (176,821 bytes) (+1,884) . . (→Warmest years: Jan-June 2015 records), which turns out to be the stuff Boris pruned later.

In short, an entire month's worth of edits can be checked in less than five minutes. Indeed you can check the changes since the 6th of June (which happens to be at the bottom of the 50-changes default history page) with one diff.

The abstract ends with As our society turns to Wikipedia as a primary source of scientific information, it is vital we read it critically - I think they need to take their own words to heart.

Update: a note on the stats

As alluded to in the comments here, there are problems with the statistical analysis, brief as it is. For example:

The geometric means (±SD) ranged from 0.2±1.4 edits per day...

(when I read the article I just blipped over the stats as obviously uninteresting, so didn't notice this). What can "0.2 +/- 1.4 edits per day" possibly mean? Well, as a textual form written as mean +/- SD, it can represent exactly that. But since edits per day is inevitably bounded below by zero, the form doesn't mean very much. This looks to be symptomatic of someone running a bunch of numbers through a stats program without thinking. The paper is, nominally at least, per reviewed and its rather disappointing this wasn't picked up. PLOS ONE has very low standards, as I understand it: as long as the paper isn't obvious drivel it can be published. This is at the low end of even that rather forgiving scale.

Refs

* Not a ref, but #crypticclimateclue from Gavin: "For who's sake? Griper role played by emeritus fellow (5,1,6,2)"
* Gizmodo has a crap article, too.
* User_talk:Jimbo Wales#PLOS ONE article about Wikipedia POV-pushing in the sciences

More like this

Hit a nerve, did it?

Yawn.

By David B. Benson (not verified) on 15 Aug 2015 #permalink

I've sometimes wished for an app that would delve through the edit history of a Wikipedia article and color code text based on how frequently it had been edited over time.
White -- unchanged in the last year or so. Blue, a small number of changes, up to red, more than one change per week on average.

By Karl Lembke (not verified) on 15 Aug 2015 #permalink

Besides WMC's issues, there are others regarding statistics.

By John Mashey (not verified) on 16 Aug 2015 #permalink

Most interesting thing about this is the 2nd author (assuming it's the same guy).

[Likens? Meant nothing to me. In the press releases he gets the noise. Emeritus. Now you say it, he shows up in "merchants of Doubt", but as one of those who found Acid Rain. Wiki agrees https://en.wikipedia.org/wiki/Gene_Likens -W]

By James Annan (not verified) on 16 Aug 2015 #permalink

OK, so I actually read the paper. It's amazing that they got a peer reviewed publication out of something so utterly trivial with no attempt at in-depth analysis or insight.

Download the data, run a couple of R scripts, and write it up. From start to finish it couldn't have been more than an afternoon's work.

By Raymond Arritt (not verified) on 16 Aug 2015 #permalink

> Karl Lembke
> ... wished for an app that would delve through
> the edit history of a Wikipedia article and
> color code text based on how frequently it
> had been edited over time.

Good idea, especially if the back-and-forth-but-no-progress editing has a telling color. I suggest puce.

Seems to me the warning in the paper is for the youngsters (and elderly, increasingly, I notice) who go there, read the front page text, and think it's reliable. Do you think young adults nowadays are smart enough to assume they need to read multiple versions of the reference material? I haven't seen that happening.

Last time I pointed out a visible comet to a bunch of theoretically educated teenage people, the question from several was "is it going to hit us?"

http://assets.amuniversal.com/71d1d6b009ca0133f3b1005056a9545d

By Hank Roberts (not verified) on 16 Aug 2015 #permalink

Reminds me a little of a discussion I had with a colleague a few weeks back... he was arguing in favour of hardcopy over online resources for just such reasons. He later pulled out a roughly 50-year-old encyclopedia (certainly Soviet-era) to back up an assertion on climate science...

More on the stats problems.

See the Figure inthe paper

If the distributions are right-skewed, and non-negative, they might be lognormal, or at least close, in which case a standard deviation is useful, but it's a *multiplicative* SD, not an additive. (Of course, one would want to compute skew and kurtosis of the log distribution, maybe do a normality test, to assess whether it's close enough to lognormal for the SD's to be really helpful.)

Wikipedia says:
"The geometric standard deviation was proposed for the first time by Kirkwood (1979) as a measure of log-normal dispersion analogously to the geometric mean.[1] As the log-transform of a log-normal distribution results in a normal distribution, we see that the geometric standard deviation is the exponentiated value of the standard deviation of the log-transformed values, i.e. \sigma_g = \exp(\operatorname{stdev}(\ln(A))).

As such, the geometric mean and the geometric standard deviation of a sample of data from a log-normally distributed population may be used to find the bounds of confidence intervals analogously to the way the arithmetic mean and standard deviation are used to bound confidence intervals for a normal distribution. See discussion in log-normal distribution for details."

a) It is hard to understand how 0.5 +/- 2.0 edits/day is a useful characterization, given that there are never negative edits.
I might believe 0.5 */ 2.0, assuming 2.0 was really the Geometric SD.

b) but then 36.2 +/- 10.2 and especially 142.3 +/ 22.9 doesn't seem to use Geometric SD, as that would imply some very large edits, thousands of words, i.e., 142*22.9 = ~3300 words, with 17% bigger (if it were lognormal).

By John Mashey (not verified) on 16 Aug 2015 #permalink

Note: see log-normal distribution, skip to :"Occurrence" section. It is widely-used in some areas of science and engineering.
It is also useful in computer performance analysis, i.e., in explaining the SPEC CPU benchmarks.

By John Mashey (not verified) on 16 Aug 2015 #permalink

The Wiki could hire the pick of the Britannica editors it has un-employed , and set them to casting a gimlet eye on the fact checking fray

By Russell Seitz (not verified) on 16 Aug 2015 #permalink

@Karl
"I’ve sometimes wished for an app that would delve through the edit history of a Wikipedia article and color code text based on how frequently it had been edited over time."

We are actually working on that. Right now you can only show the author of the word, but we will soon implement that specific function as well:
http://f-squared.org/whovisual/ --> whoCOLOR

By Fabian Flöck (not verified) on 18 Aug 2015 #permalink

I think you are being unfair to PLoS ONE. The only difference between it and any other journal is the lack of requirement for scientific impact (sexiness, really). It is supposed to be equally as rigorous as any other journal with regard to strength of argument, sound methodology, statistical rigor, etc. You can find as many (or more) dodgy stats in Science or Nature.

[I don't think that's true. Even leaving aside the dodginess of the stats, what's really striking about this paper is its thinness. You wouldn't get away with that in a "traditional" journal -W]

By Dave Mellert (not verified) on 18 Aug 2015 #permalink