Following on to yesterday’s post, where I wrote about the four functions that traditional publishers claim as their space (registration, certification, dissemination, preservation), I want to revisit an argument I made last week at the British Library.
In my slides, I argued that the web brings us at least three additional functions: integration, annotation, and federation. I wanted to get this argument out onto the web and get some feedback…
Let’s start with integration. The article no longer sits on a piece of dead tree, inside a journal formatted by date and volume and page number. It exists as a digital entity, capable of dense integration into other digital entities. One way to think of this is to think of how the citation is truly weak tea compared to the hyperlink – an individual citation carries more weight than an individual hyperlink, but the hyperlink is so easy to create, and carries so much power in aggregate, that we get Google. Citations are the only way most articles are integrated with other articles, and that simply has to change.
Articles need to be integrated with lots of other digital information. Media is an obvious one, and the Elsevier-Cell “article of the future” seems to start here with an interview with the authors. To me this is absurd, and the height of how a “big company” thinks “the users” use the web. I don’t want to hear an author interview with a reporter. I assume the author is going to say his or her work is sweets and sparkles and Nobel prizes. I’d rather see an embedded high-resolution video of all protocols necessary to replicate the experiment like the ones you get from JoVE (I’d like them to actually be open access too, but that’s a different blog post).
If you want to make the article of the future, start with integration and work backwards. Don’t start with the article and work forward, because you’ll be trapped in document mentality instead of the network mentality.
We don’t just want the data downloadable, we want to be able to run the same algorithms the author ran on the data, and adjust the variables myself, to see if the results are the output of statistical foul play or negligence. We want to be able to hide all the boring language that recapitulates past canon and focus on the new assertions, unless of course the author is trying to game the past canon and shade the facts. And we want to be able to effortlessly click out and get data about the assertions in the paper from other databases – when there’s a gene mentioned, we should be able to one-click and run any number of core queries against the sequence, the ontological classifications, order genetic materials from biobanks and so forth.
Annotation is the second new essential function. The old method of annotation is through either writing a new paper that validates, invalidates, extends, or otherwise affects the assertions made in an old paper. Or if something is really wrong, there might be a letter to the editor or a retraction. In a wiki world, this is fundamentally insane. The paper is a snapshot of years of incremental knowledge progress. We have much better technology to use than dead trees.
Of course, there isn’t any incentive to take the wiki that is science and actually use a wiki to create and edit it. Scientists get tenure for papers, and egoboo is cold comfort. Annotation needs to be provided by publishers, and is being provided, but the next step is to create an open platform that actually tracks the kind of annotation-relationships that the web enables. Bloggers use trackback to create a formal hyperlink between blog posts, and the protocol can and should be extended to let us connect all sorts of things: articles, wiki pages, database entries, catalog pages for biological materials, data sets, and on and on. By making these link transactions – which exist anyway – explicit and trackable, and most importantly reportable, we’ll create a currency that scientists will gladly spend. It won’t be about “sharing” but instead about “publishing” more of the intermediate knowledge that currently gets left on the lab floor when the paper gets written.
Federation is the last essential new function I’ll deal with here (have some theories on other long term essential ones, but they’re poorly formed in comparison). By federation I mean the ability to take a set of articles and federate them into a corpus with other materials. There’s a lot of reasons one might want to do this: text mining, semantic indexing, integration with information that is private, and so forth. It’s great to be able to read articles on the web. But if we’re going to really explode the way we communicate, the ability to cache local copies (or cloud copies) in new formats for new kinds of analysis, and the right to then distribute the resulting corpus for follow-on innovation and exploration, is going to be central.
Publishers are so focused on the prevention of copying that they don’t see the central business opportunity here: the human-readable, copyrighted version of the article is the least federation-friendly. Charge a fee to make the article beautifully machine-readable and give away the text – because the service of improving the technical aspects of the article is clearly a value-add that shouldn’t be subject to a funder mandate.
Integration, Annotation, Federation. It’s what the Web is all about. And if we can get to the point where publishers feel these as core responsibilities, the Open Access debate will have made a major leap. All of these create a world in which the text of the article itself is lower in economic value, and thus easily distributable, than the connectivity of that article into a larger web of information. OA is the beginning, not the end game, of making the web work for science the way it works for culture. Step two is all about the connectivity, and it’s time to start arguing – loudly – for the right to start wiring the science together.