"A Breakthrough In Data Licensing For Open Science"

(note - I have edited this post to add in Rufus Pollock, who I left out primarily because I wasn't sure he would endorse the ideas in this post - Peter notes that he was not only at the meeting but essential, so I'm happy to add these edits!)

Peter Murray-Rust has posted some essential reading for anyone interested in open data in the sciences. He follows onto Cameron Neylon's post whose title I have quoted in my own title here.

Peter summarizes an informal summit meeting held lest week in the UK by a group of folks interested in open science and open data, including Rufus Pollock of the Open Knowledge Foundation. And he raises a point that I've made many times myself: open scientific data is a fundamentally different beast than open culture and open source software. At its core is not an automatic copyright. And its usage is fundamentally different as well - it's meant to be recombined as endlessly as possible.

As is often the case, I think Peter's right. And I'd like to salute the existence of the "Panton Principles" for open science:

"Where a decision has been taken to publish data deriving from public science research, best practice to enable the re-use and re-purposing of that data, is to place it explicitly in the public domain via {one of a small set of protocols e.g. cc0 or PDDL}."

I completely agree. Read through to Peter's post to see the history of the debate.
The porting of licensing culture to data, whether in the service of closure or of freedom, strikes me as a poor choice. That's obvious to anyone who's heard me speak on the topic, or been stuck on a listserv with me over the past few years.

I am deeply worried by the idea that we should assume that systems that function in a world of automatic copyright will work in a world where the content is instead factual information. The Panton Principles reflect almost word for word what I hear over and over again from scientists, from anthropology to astronomy to physics, chemistry, geography, and biology. So I like them. I'm not able to sign on behalf of my organization, but I definitely sign on as an individual.

It's easy to wargame the negative consequences of extending the property culture to scientific data. Different disciplines will have different ideas about what "share alike" or "copyleft" will mean, what "distribution" will mean, and write their own licenses - propagating the problem of license incompatibility that has kept wikipedia divided from BY-SA in the free culture space, but into a place where it would not exist were it not for supposedly "free" licenses.

I'm very glad to see the early adoption of CC0 by organizations like the Tropical Disease Initiative, the Personal Genome Project, the Tranche filesharing network, and the European Molecular Biology Laboratory. That's a good indicator that our work is hitting its mark in its early days. And with the PGP now getting ready to move from 10 genomes to 13,000, CC0 is on its way in the sciences. We just have to keep pushing for the public domain in the sciences while more research takes place on databases writ large.

I can also say, from experience, also that opening the door to licensing of any sort in scientific data opens it for bad licensing - there's real appetite for non commercial data licensing in lots of places. Trying to keep that down while saying legal tools are appropriate in other places is really hard. The Concept Web Alliance is an example of a place where the very mention of "license" for data leads inexorably to non commercial licenses, and where only a bright line drawn of the public domain is workable as a design principle.

A lot of this comes from a probability analysis. The incremental daily use of a data license seems to be low-risk - but the magnitude of a failure is enormous, if we get license forking and non commercial clauses. And over time, I'm pretty sure the probability of licenses leading to license forking and non commercial is 1. The biologists will write a different definition in a contract than the physicists, and that's all it takes. Entrenching the concept that each scientific discipline can decide its own fate in legal-code space instead of normative space seems to guarantee a non-interoperable data world...

To be clear, we didn't do the research on databases of cultural stuff, like flickr, or like a database of short films. My comments shouldn't be taken in that context. But I think that such research should indeed be undertaken. One thing we've learned at CC is that licenses take on a life of their own once they get released. And although you can retire a license once you realize it wasn't the best idea, like our Developing Nations license, it never goes away.

I pushed last week in the UK for precisely this kind of research. And I'm hopeful that it happens sooner rather than later. But in the interim, the Panton Principles sound like a good baseline for figuring out what to do in the sciences starting today. Thanks to Peter and Cameron and Rufus for this essential work.

More like this

It was actually three of us - Rufus Cameron and me. Rufus was critical. The key thing was to define the boundaries of science within knowledge more generally.

And many thanks for your personal support. And Kaitlin for happening to be on the same train as Cameron and me.