Integrate. Annotate. Federate.

By jwilbanks on July 31, 2009.

Following on to yesterday's post, where I wrote about the four functions that traditional publishers claim as their space (registration, certification, dissemination, preservation), I want to revisit an argument I made last week at the British Library.

In my slides, I argued that the web brings us at least three additional functions: integration, annotation, and federation. I wanted to get this argument out onto the web and get some feedback...

Let's start with integration. The article no longer sits on a piece of dead tree, inside a journal formatted by date and volume and page number. It exists as a digital entity, capable of dense integration into other digital entities. One way to think of this is to think of how the citation is truly weak tea compared to the hyperlink - an individual citation carries more weight than an individual hyperlink, but the hyperlink is so easy to create, and carries so much power in aggregate, that we get Google. Citations are the only way most articles are integrated with other articles, and that simply has to change.

Articles need to be integrated with lots of other digital information. Media is an obvious one, and the Elsevier-Cell "article of the future" seems to start here with an interview with the authors. To me this is absurd, and the height of how a "big company" thinks "the users" use the web. I don't want to hear an author interview with a reporter. I assume the author is going to say his or her work is sweets and sparkles and Nobel prizes. I'd rather see an embedded high-resolution video of all protocols necessary to replicate the experiment like the ones you get from JoVE (I'd like them to actually be open access too, but that's a different blog post).

If you want to make the article of the future, start with integration and work backwards. Don't start with the article and work forward, because you'll be trapped in document mentality instead of the network mentality.

We don't just want the data downloadable, we want to be able to run the same algorithms the author ran on the data, and adjust the variables myself, to see if the results are the output of statistical foul play or negligence. We want to be able to hide all the boring language that recapitulates past canon and focus on the new assertions, unless of course the author is trying to game the past canon and shade the facts. And we want to be able to effortlessly click out and get data about the assertions in the paper from other databases - when there's a gene mentioned, we should be able to one-click and run any number of core queries against the sequence, the ontological classifications, order genetic materials from biobanks and so forth.

Annotation is the second new essential function. The old method of annotation is through either writing a new paper that validates, invalidates, extends, or otherwise affects the assertions made in an old paper. Or if something is really wrong, there might be a letter to the editor or a retraction. In a wiki world, this is fundamentally insane. The paper is a snapshot of years of incremental knowledge progress. We have much better technology to use than dead trees.

Of course, there isn't any incentive to take the wiki that is science and actually use a wiki to create and edit it. Scientists get tenure for papers, and egoboo is cold comfort. Annotation needs to be provided by publishers, and is being provided, but the next step is to create an open platform that actually tracks the kind of annotation-relationships that the web enables. Bloggers use trackback to create a formal hyperlink between blog posts, and the protocol can and should be extended to let us connect all sorts of things: articles, wiki pages, database entries, catalog pages for biological materials, data sets, and on and on. By making these link transactions - which exist anyway - explicit and trackable, and most importantly reportable, we'll create a currency that scientists will gladly spend. It won't be about "sharing" but instead about "publishing" more of the intermediate knowledge that currently gets left on the lab floor when the paper gets written.

Federation is the last essential new function I'll deal with here (have some theories on other long term essential ones, but they're poorly formed in comparison). By federation I mean the ability to take a set of articles and federate them into a corpus with other materials. There's a lot of reasons one might want to do this: text mining, semantic indexing, integration with information that is private, and so forth. It's great to be able to read articles on the web. But if we're going to really explode the way we communicate, the ability to cache local copies (or cloud copies) in new formats for new kinds of analysis, and the right to then distribute the resulting corpus for follow-on innovation and exploration, is going to be central.

Publishers are so focused on the prevention of copying that they don't see the central business opportunity here: the human-readable, copyrighted version of the article is the least federation-friendly. Charge a fee to make the article beautifully machine-readable and give away the text - because the service of improving the technical aspects of the article is clearly a value-add that shouldn't be subject to a funder mandate.

Integration, Annotation, Federation. It's what the Web is all about. And if we can get to the point where publishers feel these as core responsibilities, the Open Access debate will have made a major leap. All of these create a world in which the text of the article itself is lower in economic value, and thus easily distributable, than the connectivity of that article into a larger web of information. OA is the beginning, not the end game, of making the web work for science the way it works for culture. Step two is all about the connectivity, and it's time to start arguing - loudly - for the right to start wiring the science together.

More like this

The Buzz: Taking Data Digital

The Internet may have largely replaced many traditional means of storing and sharing information, but as ScienceBloggers are pointing out, it has far to go before its potential is fully realized, particularly in research. On Built on Facts, Matt Springer discusses what it would take to digitize…

Yes! and no.

John Wilbanks is brilliant - let's just get that down first. He makes some great points in his most recent posts (1,2), but I also disagree with a few of the things he has said. In my abstract for the upcoming 4S conference, I echoed what Borgman and Bohlin both said, and one of his main points:…

Semantic Enhancements of a Research Article

In today's PLoS Computational Biology: Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article: Scientific innovation depends on finding, integrating, and re-using the products of previous research. Here we explore how recent developments in Web technology,…

Actual comps response: Information Retrieval

Now that I'm not scared to look at my responses... This one doesn't look so bad, so I'm sharing. Please do keep in mind that this was written in 2 hours, by a tired person, with tired fingers! --- Christina K. Pikas Comps Information Retrieval (Minor) July 20, 2009 Question F2: Design an…

Tracking technologies may record information such as Internet domain and host names; Internet protocol

Connecting weather with climate is a tricky thing. Some thoughts... one very interesting result of GCMs would be the projected locations of low and high pressure areas with a higher tropospheric energy (stored there

McCulloch accuses Steig et al. of appropriating his âfindingâ that Steig et al. did not account for autocorrelation when calculating the significance of trends. While the published version of the paper didnât include such a correction, it is obvious that the authors were aware of the need to do so, since in the text of the paper it is stated that this correction was made. The corrected calculations were done using well-known methods, the details of which are available in myriad statistics textbooks and journal articles. There can therefore be no claim on Dr. McCullochâs part of any originality either for the idea of making such a correction, nor for the methods for doing so, all of which were discussed in the original paper. Had Dr. McCulloch been the first person to make Steig et al. aware of the error in the paper, or had he written directly to Nature at any time prior to the submission of the Corrigendum, it would have been appropriate to acknowledge him and the authors would have been happy to do so. Lest there be any confusion about this, we note that, as discussed in the Corrigendum, the error has no impact on the main conclusions in the paper.

It's not just trackbacks, but "rel"= encoded links, conveying metadata.
http://microformats.org/wiki/citation

Fair enough, although I prefer RDFa to microformats. I do like the trackback protocol better for this particular function - it's easily extensible and lets us get fully directional typed links.

Trackback is dead; spam killed it. I'd love to see its zombie corpse re-animated though!

Mr Wilbanks, I read your post yesterday but didn't want to respond because it's one of those things that make you think deep.

Thank you for today's post. I do multimedia projects for a science publishing house, and it's a coincidence that I have been asked to find out 'how' (it's no longer about 'what' anymore) our readers want to access information on science. Then I found your blogs! Happy days. I'm going back to uni to do digital anthropology mainly to figure out a better science for the dissemination of information. Somehow I don't feel I can achieve that by doing an MSc in Compsci/Multimedia whatever. My hunch (not scientific, I know) is telling me that I've got to start with ehtnography/ethnology first.

Keep us posted if you're doing a lecture in London again. I sure like to attend the next one in the Big Smoke.

Do you want every ranting conspiracy theorist to be able to leave permanent comments on your research papers? One very good thing about peer review is that it excludes lunatics and people who are utterly ignorant about the topic under consideration. And there are a lot of them, and they love to post things on the web. Just look at the âtalkâ page for a few controversial Wikipedia articles.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

World Opera, Collaborative Science, and Getting On The One

March 3, 2011

(blows off the dust since the last entry) (Life trumped blogging; my first child was born in March) Just before I went into the parent tunnel, which is awesome by the by, I attended a seminar conducted by Niels Windfeld Lund, General Manager of the World Opera. Not my usual event. But music's…

Documents and Data...

September 10, 2010

Last month I was on Dr. Kiki's Science Hour. Besides being a lot of fun (despite my technical problems, which were part of my recent move to GNU/Linux and away from Mac!), I also discovered that at least one person I went to high school with is a fan of Dr. Kiki, because he told everyone about the…

Marking and Tagging the Public Domain

August 11, 2010

I am cribbing significant amounts of this post from a Creative Commons blogpost about tagging the public domain. Attribution is to Diane Peters for the stuff I've incorporated :-) The big news is that, 18 months since we launched CC0 1.0, our public domain waiver that allows rights holders to place…

rdf:about="Shakespeare"

July 11, 2010

Dorothea has written a typically good post challenging the role of RDF in the linked data web, and in particular, its necessity as a common data format. I was struck by how many of her analyses were spot on, though my conclusions are different from hers. But she nails it when she says: First, HTML…

Of Pepsi and ScienceBlogs...

July 7, 2010

I've gotten a few emails about the Pepsi-ScienceBlogs tempest. It's clearly taken a toll on ScienceBlogs' credibility. Some of my SciBlings have resigned in protest, and others are taking shots on the topic. Sponsorship is part of scientific publishing, even in the peer reviewed world. Remember how…