"A Breakthrough In Data Licensing For Open Science"

By jwilbanks on May 18, 2009.

(note - I have edited this post to add in Rufus Pollock, who I left out primarily because I wasn't sure he would endorse the ideas in this post - Peter notes that he was not only at the meeting but essential, so I'm happy to add these edits!)

Peter Murray-Rust has posted some essential reading for anyone interested in open data in the sciences. He follows onto Cameron Neylon's post whose title I have quoted in my own title here.

Peter summarizes an informal summit meeting held lest week in the UK by a group of folks interested in open science and open data, including Rufus Pollock of the Open Knowledge Foundation. And he raises a point that I've made many times myself: open scientific data is a fundamentally different beast than open culture and open source software. At its core is not an automatic copyright. And its usage is fundamentally different as well - it's meant to be recombined as endlessly as possible.

As is often the case, I think Peter's right. And I'd like to salute the existence of the "Panton Principles" for open science:

"Where a decision has been taken to publish data deriving from public science research, best practice to enable the re-use and re-purposing of that data, is to place it explicitly in the public domain via {one of a small set of protocols e.g. cc0 or PDDL}."

I completely agree. Read through to Peter's post to see the history of the debate.
The porting of licensing culture to data, whether in the service of closure or of freedom, strikes me as a poor choice. That's obvious to anyone who's heard me speak on the topic, or been stuck on a listserv with me over the past few years.

I am deeply worried by the idea that we should assume that systems that function in a world of automatic copyright will work in a world where the content is instead factual information. The Panton Principles reflect almost word for word what I hear over and over again from scientists, from anthropology to astronomy to physics, chemistry, geography, and biology. So I like them. I'm not able to sign on behalf of my organization, but I definitely sign on as an individual.

It's easy to wargame the negative consequences of extending the property culture to scientific data. Different disciplines will have different ideas about what "share alike" or "copyleft" will mean, what "distribution" will mean, and write their own licenses - propagating the problem of license incompatibility that has kept wikipedia divided from BY-SA in the free culture space, but into a place where it would not exist were it not for supposedly "free" licenses.

I'm very glad to see the early adoption of CC0 by organizations like the Tropical Disease Initiative, the Personal Genome Project, the Tranche filesharing network, and the European Molecular Biology Laboratory. That's a good indicator that our work is hitting its mark in its early days. And with the PGP now getting ready to move from 10 genomes to 13,000, CC0 is on its way in the sciences. We just have to keep pushing for the public domain in the sciences while more research takes place on databases writ large.

I can also say, from experience, also that opening the door to licensing of any sort in scientific data opens it for bad licensing - there's real appetite for non commercial data licensing in lots of places. Trying to keep that down while saying legal tools are appropriate in other places is really hard. The Concept Web Alliance is an example of a place where the very mention of "license" for data leads inexorably to non commercial licenses, and where only a bright line drawn of the public domain is workable as a design principle.

A lot of this comes from a probability analysis. The incremental daily use of a data license seems to be low-risk - but the magnitude of a failure is enormous, if we get license forking and non commercial clauses. And over time, I'm pretty sure the probability of licenses leading to license forking and non commercial is 1. The biologists will write a different definition in a contract than the physicists, and that's all it takes. Entrenching the concept that each scientific discipline can decide its own fate in legal-code space instead of normative space seems to guarantee a non-interoperable data world...

To be clear, we didn't do the research on databases of cultural stuff, like flickr, or like a database of short films. My comments shouldn't be taken in that context. But I think that such research should indeed be undertaken. One thing we've learned at CC is that licenses take on a life of their own once they get released. And although you can retire a license once you realize it wasn't the best idea, like our Developing Nations license, it never goes away.

I pushed last week in the UK for precisely this kind of research. And I'm hopeful that it happens sooner rather than later. But in the interim, the Panton Principles sound like a good baseline for figuring out what to do in the sciences starting today. Thanks to Peter and Cameron and Rufus for this essential work.

More like this

I like very much the writings and pictures and explanations in your adress so I look forward to see your next writings.

It was actually three of us - Rufus Cameron and me. Rufus was critical. The key thing was to define the boundaries of science within knowledge more generally.

And many thanks for your personal support. And Kaitlin for happening to be on the same train as Cameron and me.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

World Opera, Collaborative Science, and Getting On The One

March 3, 2011

(blows off the dust since the last entry) (Life trumped blogging; my first child was born in March) Just before I went into the parent tunnel, which is awesome by the by, I attended a seminar conducted by Niels Windfeld Lund, General Manager of the World Opera. Not my usual event. But music's…

Documents and Data...

September 10, 2010

Last month I was on Dr. Kiki's Science Hour. Besides being a lot of fun (despite my technical problems, which were part of my recent move to GNU/Linux and away from Mac!), I also discovered that at least one person I went to high school with is a fan of Dr. Kiki, because he told everyone about the…

Marking and Tagging the Public Domain

August 11, 2010

I am cribbing significant amounts of this post from a Creative Commons blogpost about tagging the public domain. Attribution is to Diane Peters for the stuff I've incorporated :-) The big news is that, 18 months since we launched CC0 1.0, our public domain waiver that allows rights holders to place…

rdf:about="Shakespeare"

July 11, 2010

Dorothea has written a typically good post challenging the role of RDF in the linked data web, and in particular, its necessity as a common data format. I was struck by how many of her analyses were spot on, though my conclusions are different from hers. But she nails it when she says: First, HTML…

Of Pepsi and ScienceBlogs...

July 7, 2010

I've gotten a few emails about the Pepsi-ScienceBlogs tempest. It's clearly taken a toll on ScienceBlogs' credibility. Some of my SciBlings have resigned in protest, and others are taking shots on the topic. Sponsorship is part of scientific publishing, even in the peer reviewed world. Remember how…