This post was prompted by the combination of three events: a visit with the founder of PubGet, an invitation to keynote at a conference on publishing, and an interview with Bora about the Science Online 2009 conference last January in RTP.
The past year has seen an explosion of talk about the future of the scientific article. It’s wonderful to see, even if the results are either depressingly complicated to achieve or depressingly incremental innovation. Both of those results are better than when I got into this – I remember at a conference in Sweden in 2006 hearing a grand high priest of the publishing industry argue that they’d gotten this whole digital publishing thing sorted right out…that attitude was the first thing that needed to change. Glad it has.
I’ve been hammering for years now on the need to enrich articles with semantics. My talk at that conference in Sweden was probably the first good one I gave on the topic, and it’s been an leitmotif for me going back to the mid-1990′s when I was studying epistemology and getting my first real exposure to networked computers. For years I was convinced it was right around the corner.
That semantic publishing future now feels closer than it ever has. But I’m actually less convinced it’s around the corner than in years past, and the reasons for that are human, not technical.
To be clear: in the following, I’m going to be talking about narratives and text, not about databases. The semantic future for databases and data is already here, but to paraphrase William Gibson, it’s just unevenly distributed. Those of the argument that the Semantic Web isn’t going to work have already lost the argument. You just don’t see it, because it’s an infrastructure upgrade to the back-end of the Web to make it work for data.
But the impact of formal semantics on text, which is what humans interface with, has been negligible. It’s had nowhere near the impact of tagging and folksonomy. That’s driven me, and many others who like formal semantics, crazy.
The benefits to a formal semantic approach to text are so obvious: we can start to treat knowledge as a graph, and we can even maybe start to get some network externality benefits to that knowledge. Make it more valuable via the network…one fact is like one fax machine, but many facts build a hypothesis, etc. etc. etc.
Beautiful dream. Not going to happen anytime soon.
The problem is that people are the writers. Humans. Not machines. Machines luuuuuv semantics. Otherwise they can’t tell the difference between a picture and a pitcher (or between a pitcher of water and a baseball pitcher). This is why one should never send one’s mother to buy jewelry via Google without the safe browsing mode enabled.
And people don’t like formal semantics. I majored in formal semantics, and it’s a topic that still gives me headaches.
People like stories.
Scientists are people.
Scientists like stories.
A paper is a story. It tells, in its own way, the story of years of work. Of building expertise. Of designing falsifiable hypotheses. Of the results found in the lab. Of the search to balance those results against the canon and dogma. Of the potential ramification of the results.
It’s a story of science. And the telling of it is an important part of being a human who does science.
A recent article in PLoS Genetics states that “Fission Yeast Tel1ATM and Rad3ATR Promote Telomere Protection and Telomerase Recruitment” – now, those are the key “facts” asserted. They could be written into machine-readable format. I will spare you what that would look like. Suffice to say it’s eye bleedingly ugly, and requires lots of agreement about unique identifiers. It’s doable. It’s being done for the databases and that will eventually make it possible for the literature. It’s just not fun. And it ignores the story.
It reduces the research tale to a few assertions, nested into a massive graph of stuff other people asserted. While this is great for machines, it is lousy for people.
This is all leading up to an idea I’m working on for the talk later this month. Publishers need to be in the business of providing the service that translates the stories for the machines to understand. The Web makes it trivial to publish stories in human readable form. All the beautiful layout services and print services that used to be worth paying for…aren’t. Peer review isn’t free, but it’s nowhere near as expensive as it’s made out to be – and it’s going to get transformed by the Web, too. The Web makes peer review massively more powerful as it makes it massively more democratic. The Web kills a lot of things that used to drive value in content, especially controlled content.
After all, I can’t remember the last time I used a Zagat’s guide. Not when I have Chowhound. It’s going to come to science. Don’t know exactly how, but it’s coming.
But this only covers one piece of science – the telling of the story. There’s another key, which is the ability to use the information to write a new tale. The ability to take this massive corpus of story and turn it into something that can be modeled, that can be used by humans and machines together to draft new stories…that ability is going to require the emergence of publishers who understand their role in the new content economy. It’s not as printers who use bits rather than ink. It’s as translators between the human stories and the machines who have to take those stories, integrate them into a web of linked data, and make it possible for humans to ask questions, dream dreams, and tell new stories.
The semantic article isn’t going to come from individual scientists rebelling and marking up their own text. It’s going to be a publisher value-added service – “let us make your article integrated, and comprehensible, so that you maximize your citation count and potential collaboration.”
Sounds good, doesn’t it?
Focusing on the control of copies of the article, of the story, isn’t just a losing strategy because of the open access movement, although it is that as well. It’s the wrong concept entirely. Translation is a service for which authors would gladly pay. For which searchers would gladly pay. And it’s a market that is going to get more valuable as a result of open systems, not less valuable, as the cost of controlled scientific published content drops thanks to green and gold open access.
Think about Clayton Christensen’s law of conservation of attractive profits: “When attractive profits disappear at one stage in the value chain because a product becomes commoditized, the opportunity to earn attractive profits with proprietary products usually emerges at an adjacent stage.”
Publishers are trying to fight the commoditization of the story. They shouldn’t. The vast majority of the stories are bought and paid for by the public one way or the other. Publishers should be looking at the place where they can compete on proprietary services, and taking over those markets before their competitors – or startups – beat them to it. There is enormous opportunity in the emerging open access world to make money without needing to vigilantly police the movement of content.
Help the scientists tell their stories in a way that lets those stories integrate into the digital web. Don’t just gussy up a paper version of a story with hyperlinks. Don’t focus on controlling the movement of stories. They’re sand in your hands once they’re on the network. Embrace that fact. Find the value in the next layer, the service layer.
Be a guide. Be a search engine. Be a translator.