There is an interesting entry over at In the Pipeline about a recent paper in PNAS: Microparadigms: Chains of collective reasoning in publications about molecular interactions.
In this paper, the authors analyzed summaries of papers as processed by Geneways - a fancy database of factual statements derived from published papers.
From the paper:
In this study, we focused on chronologically ordered chains of statements about published molecular interactions, such as ''protein A activates gene B'' or ''small molecule C binds protein D.'' Each chain comprises chronologically ordered positive and/or negative statements about the same pair of molecules; for brevity, we encode each such chain with a series of 0's for the negative statements, and 1's for the positive statements. For example, an imaginary chain of length 3 could include ''protein A activates protein B'' (1), ''protein A does not activate protein B'' (0) and ''protein A activates protein B'' (1) (see also Fig. 1). Discrepancies across published statements may arise because of variations in experimental conditions, errors in the conduct of the experiment, misinterpretation of results, or a combinations of these factors.There is a well established term in economics, ''information cascade'' (4), which represents a special form of a collective reasoning chain that degenerates into repetition of the same statement (4). Here we suggest a model that can generate a rich spectrum of patterns of published statements, including information cascades. We then explore patterns that occur in real scientific publications and compare them to this model.
So what did they find?
From the paper:
Our first observation, based on computation, was that, because of the huge data set, we can clearly demonstrate that momentums of published statements are notably positive, but <0.1 (see Fig. 4 A and B). This result means that scientists are often strongly affected by prior publications in interpreting their own experimental data, while weighting their own private results (which have weight 1 under our model) at least 10-fold as high as a single result published by somebody else.
The second observation was that, for all three data sets, the dominant statements were considerably ''heavier'' than the nondominant statements, revealing a tendency toward conformism (see Fig. 4; see also Appendix 1 and Data Sets 1 and 2, which are published as supporting information on the PNAS web site).
Our third and most striking finding emerged from the need to explain the observation that the published statements in our data set are predominantly positive (<5% of them are negative) and are highly correlated within chains.
From their data (collected from papers that analyzed molecular interactions in Drosophila), they conclude that our academic literary universe can be represented by two possible models.
Furthermore, our computations indicate that our data set can be interpreted in two very different ways (two ''alternative universes''): one is an ''optimists' universe'' with a very low incidence of false results (<5%), and another is a ''pessimists' universe'' with an extraordinarily high rate of false results (>90%). Our computations deem highly unlikely any milder intermediate explanation between these two extremes.
In fact in the paper they statisticaly analyze these two possibilities:
For the largest combined data set (all), the most likely universe was the pessimists' (posterior probability 0.73), followed by the optimists' (posterior probability 0.27). A very similar picture is observed for the smaller data sets (Fig. 5), but for all practical purposes, both universes successfully explained reality.
So which universe do we live in? I would not be surprised if it was the "pessimist" one.
[BTW - the whole deal with negative data keeps on popping up - I'll write something on it soon ...]
- Log in to post comments