Microparadigms: What universe do you live in?

There is an interesting entry over at In the Pipeline about a recent paper in PNAS: Microparadigms: Chains of collective reasoning in publications about molecular interactions.

In this paper, the authors analyzed summaries of papers as processed by Geneways - a fancy database of factual statements derived from published papers.

From the paper:

In this study, we focused on chronologically ordered chains of statements about published molecular interactions, such as ''protein A activates gene B'' or ''small molecule C binds protein D.'' Each chain comprises chronologically ordered positive and/or negative statements about the same pair of molecules; for brevity, we encode each such chain with a series of 0's for the negative statements, and 1's for the positive statements. For example, an imaginary chain of length 3 could include ''protein A activates protein B'' (1), ''protein A does not activate protein B'' (0) and ''protein A activates protein B'' (1) (see also Fig. 1). Discrepancies across published statements may arise because of variations in experimental conditions, errors in the conduct of the experiment, misinterpretation of results, or a combinations of these factors.

There is a well established term in economics, ''information cascade'' (4), which represents a special form of a collective reasoning chain that degenerates into repetition of the same statement (4). Here we suggest a model that can generate a rich spectrum of patterns of published statements, including information cascades. We then explore patterns that occur in real scientific publications and compare them to this model.

So what did they find?

From the paper:

Our first observation, based on computation, was that, because of the huge data set, we can clearly demonstrate that momentums of published statements are notably positive, but <0.1 (see Fig. 4 A and B). This result means that scientists are often strongly affected by prior publications in interpreting their own experimental data, while weighting their own private results (which have weight 1 under our model) at least 10-fold as high as a single result published by somebody else.

The second observation was that, for all three data sets, the dominant statements were considerably ''heavier'' than the nondominant statements, revealing a tendency toward conformism (see Fig. 4; see also Appendix 1 and Data Sets 1 and 2, which are published as supporting information on the PNAS web site).

Our third and most striking finding emerged from the need to explain the observation that the published statements in our data set are predominantly positive (<5% of them are negative) and are highly correlated within chains.

From their data (collected from papers that analyzed molecular interactions in Drosophila), they conclude that our academic literary universe can be represented by two possible models.

Furthermore, our computations indicate that our data set can be interpreted in two very different ways (two ''alternative universes''): one is an ''optimists' universe'' with a very low incidence of false results (<5%), and another is a ''pessimists' universe'' with an extraordinarily high rate of false results (>90%). Our computations deem highly unlikely any milder intermediate explanation between these two extremes.

In fact in the paper they statisticaly analyze these two possibilities:

For the largest combined data set (all), the most likely universe was the pessimists' (posterior probability 0.73), followed by the optimists' (posterior probability 0.27). A very similar picture is observed for the smaller data sets (Fig. 5), but for all practical purposes, both universes successfully explained reality.

So which universe do we live in? I would not be surprised if it was the "pessimist" one.

[BTW - the whole deal with negative data keeps on popping up - I'll write something on it soon ...]

More like this

Two weeks ago, an interesting commentary by Paul Nurse, came out in Nature. The bottom line? We need to change how we study and understand cellular signaling cascades. First, some background. Cellular function is governed by a network of protein interactions that act like an information processing…
Olfaction (smell) is the most mysterious of senses, and is wrongly regarded as insignificant by most people. The sense of taste, for example, consists in large part of smell - try holding your nose next time you eat - and the recent identification of putative pheromone receptors in humans suggests…
Monday I saw an incredible lecture by U. Wash's Ning Zheng. (Yes Bil, I actually enjoyed a structure biology talk!) I'll just summarize Dr. Zheng's last paper that was on the cover of the April 5th edition of Nature. Intense studies on phototaxis in plants that began in part by Darwin (yup, that's…
Sucrose Molecules of sucrose tore apart in their bellies letting glucose course free in their veins. Luckily for us, a system evolved long ago to capture that glucose and minimize it's potential for damage. Removing sugar from the blood and sequestering it in liver, fat, and muscle cells,…