Chemistry: on the internet or in cyberspace?

I'm at a workshop on eChemistry today, and we were asked to prepare position statements. I'm not going to blog the conference - it's a private thing - but figured I would post my position statement here.

We were asked to answer some questions. I chose to answer this one: "do you assess the potential of new web-based communication models in Chemistry, i.e. their benefits or liabilities, their transformational power, and their chance of success?"

Full text is after the jump.

A good place to start is the transformation of scholarly communication from "using the internet" to "existing in cyberspace." I take this distinction from my own personal introduction to this world, Larry Lessig:

"EVERYONE WHO IS READING THIS BOOK HAS USED THE INTERNET. SOME HAVE BEEN in "cyberspace." The Internet is that medium through which your e-mail is delivered and web pages get published. It's what you use to order books on Amazon or to check the times for local movies at Fandango. Google is on the Internet, as are Microsoft "help pages."

But "cyberspace" is something more. Though built on top of the Internet, cyberspace is a richer experience. Cyberspace is something you get pulled "into," perhaps by the intimacy of instant message chat or the intricacy of "massively multiple online games" ("MMOGs" for short, or if the game is a role-playing game, then "MMORPGs"). Some in cyberspace believe they're in a community; some confuse their lives with their cyberspace existence. Of course, no sharp line divides cyberspace from the Internet. But there is an important difference in experience between the two. Those who see the Internet simply as a kind of Yellow-Pages-on-steroids won't recognize what citizens of cyberspace speak of. For them, "cyberspace" is simply obscure. "

(Lessig, Code and Other Laws of Cyberspace v. 2)

What we've been doing for the most part in scholarly communication is using the internet. We've been making digital versions of papers - PDFs - and using the network to post them. You can use the network to order them, rent them, read them. But they're not in cyberspace in this concept - they're not interactive by technical terms, social terms, or legal terms. They are actually less free - thanks to DRM and the move to lease terms from sale terms - than they used to be. OA in many ways is a reaction to this irony, as well as a response to the two pressing problems of increased serials pricing and filter failure for scientific information.

The transformational power of getting into cyberspace for scholarly publishing is huge. If we can start to leverage both a) the power of the crowd and b) the power of technological enhancement more efficiently - i.e., without the high transaction costs, permission barriers, and information exclusion - then the mathematical odds of someone, somewhere making a breakthrough discovery go up.

That could be innovations in scholarly communication itself - perhaps a new way to index information, like google represented in the late 1990s. Imagine if Brin and Page had been forced to negotiate access to web pages before hacking, and the paucity of the excuse that "we let Google index our content" serves. If that had been the attitude in the Web we'd still all be using yahoo taxonomies, because everyone would have done the deal with the existing dominant force, blocking the emergence of innovative entrepreneurial search. That's where we are with scholarly search and indexing now.

It could also be innovations in the science itself. There are a lot of smart people in this world who don't have access to integrated information - who can't afford access to the literature, or to the costly indexing and integration services that surround it. Who have hypotheses they can't test rapidly against the published information space. A world in which the data and the literature are more densely integrated means a world where model-building gets a lot easier. What we are doing here is reducing the time and cost at which the Kuhnian revolution cycles operate - dumb ideas get exposed faster, and good ideas get validated faster. This is about the only way to accelerate those revolutions that does not rely on magical thinking: if we can make the things we know more useful in the evaluation of hypotheses and models, we are simply increasing the mathematical odds of discovery. This is the transformational potential. It is treating the literature and data online as elements in a vast periodic table of knowledge, a common reference point against which we can test how things fit together.

This potential does not mean the end of publishers or of peer review. In my view, it makes them both even more important, though both will of necessity be forced to evolve some new methods to deal with the new world. In science cyberspace the guarantees of provenance and persistence will be essential - who said what, and when, and how can we ensure the durability of a citation? Was a link crowdsourced or peer reviewed? And what business models emerge from this - in particular the sale of peer reviewed links ahead of publication, so that publishers can sell a set of links that have extra "trust" attached while releasing the underlying text for crowdsourced, free linking to follow publication?

But, as we move into a more OA world (even if not a total OA world), as semantics come of age in publishing prose and data and the hybrid "object-relation" world that much more accurately reflects the reality of any given experimental research output, we have a lot of choices to make. None of the outcomes are foreordained, because none of them are natural - they're all creatures of code that humans write, and that code can be changed, to make it more open or less open. We can think here about the idealistic early days of the network, when many wrote of the "innate openness" of cyberspace. The current battles over net neutrality reflect a more realistic world: we made the network open, and we can also make it closed.

So the signal question is: should the models be walled gardens or www? Walled gardens bring more short-term benefit, but also more long term liabilities. AOL in 1991 was a much richer user experience than the Web in 1991, but the very openness and chaos of the Web included the seeds of its explosion, just as the enclosure of AOL ensured it would fail to receive the benefits of its users' innovations. There is compelling research into the nature of "generative" systems - systems that *by their design* result in the creation of unexpected outputs, where the design ethos is to give the user power, rather than to give the user what the designer thinks they want. Generative systems are capable of unbelievable explosive power. Thus, compared to AOL 1991, WWW 1991 brings more transformational power over time and thus more likelihood of "success" - if we define success as a generative system akin to the www. But the definition of success is as important here as anything.

It's also essential to note, as Tony Hey of Microsoft has repeatedly pointed out to me, that generativity is not the exclusive province of open source. The PC is a generative platform: anyone can write code to the C prompt. That simple decision is one of the primary reasons the Apple computers failed to keep up with the PCs for so long. Apple chose control, and a beautiful user interface. PCs chose generativity, and dominate the market today. We'll have a chance to watch the same battle take place in phones with Android and the iPhone, though the iPhone is at least now partially generative.

Generative systems are also vulnerable to abuse. Spam is the example that we all probably know most personally. But the openness of a system seems to correlate quite well with its abuse - the openness creates a generative power, which creates value and draws users, and those users in turn draw spammers, liars, phishers, Nigerian bank fraud, and more. The pain of this in email is one thing. In scholarly communications it could be truly dreadful - how would you track down and fix the malicious impact of a virus writer who systematically screws with the numbers in tables on clinical trial data, or drug structures, or QSAR data, in an integrated open web?

Also, any such system that requires major internal investment in infrastructure (i.e. bespoke design, as opposed to www systems) will likely result in the concentration of power in the corporate entities that have the funds to invest in r&d. Very few small society publishers or small independent journals will survive in a world where semantic enhancement of publication requires much more than a few clicks to achieve, and will instead continue to find shelter inside a rapidly shrinking number of corporate homes. This is a major potential consequence of OA + SW and must be mitigated through good strategy and investment in open semantic web infrastructure if it is to be avoided.

I come back again to the importance of the publishing and peer review. Provenance and persistence, citation and verifiability, these are the services that will be essential to figuring out what pieces of content can be trusted. But these services do not rely on control of copyright or content - they are quality control services, and they are much better suited to the province of trademark and brand than to copyright. Trademarks and quality certification represent a method to both create new business models and ensure trust on the generative web that do not at the same time restrict the very generative powers that we need to accelerate innovation and discovery cycles.

There will always be a need for the trust that publishers create through peer review. That trust might come from lots of angles, including the traditional publishers. But we cannot continue to stymie the power of the network to help us make discoveries and advancements in science. This is one of our only non-miraculous avenues to improving the way science works. We need science too much, and we need science to work better, now.


More like this

Internet is very usefull.
I have gotten my Finall study topic from internet.
I can find alot of information there.But we must aware from