June 24, 2009
Category:
There's an interesting tweet about attribution in the data web. And it raises a tension I run into a lot but haven't seen a lot written about: the shifting nature of what the word "attribution" means.
We have a fairly common understanding of attribution in our daily lives: credit where credit is due is mine, and it tends to be what most people think. This is whether one is a musician, a scientist, a teacher, or anyone who does creative or innovative work. We like getting credit for our work. No problem there.
This idea of attribution encompasses the idea that we should get credit for our ideas. That if I'm the first one to realize that a certain gene knockout cures death, that the idea is linked to me forever. Like we link Watson and Crick to the DNA discovery. In this sense, attribution is very similar to the scholarly concept of citation.
However, the word "attribution" in a copyright license is a different beast. It even has a different wikipedia entry (which I did not create, and have not edited, despite my temptation!). I don't like the first sentence a lot, because it's not clear that in copyrights, attribution is something that gets triggered by the making of a copy - not by the use of the ideas in the copyrighted work.
This is the thing about the law. It's narrow in a lot of places. And it's often not what we think it is. Mainly because it was written by lawyers, not regular people.
Let's look at the legal code of the Creative Commons Attribution license. It's interesting.
The license grants the following rights:
- to Reproduce the Work, to incorporate the Work into one or more Collections, and to Reproduce the Work as incorporated in the Collections;
- to create and Reproduce Adaptations provided that any such Adaptation, including any translation in any medium, takes reasonable steps to clearly label, demarcate or otherwise identify that changes were made to the original Work. For example, a translation could be marked "The original work was translated from English to Spanish," or a modification could indicate "The original work has been modified.";
- to Distribute and Publicly Perform the Work including as incorporated in Collections; and,
- to Distribute and Publicly Perform Adaptations.
See? It's about reproducing the work, adapting the work, and distribution. I don't need these rights to read a work, or study a data set, and take the ideas in the work or the data set. I only need them to make copies and derivatives. The law doesn't allow ideas or facts to be covered by copyright. But don't take it from me, take it from the US Government:
"Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed."
Now, because copyright doesn't protect these things, "attribution" in the sense of the license doesn't apply to ideas or facts either. Those rights above are conditional on my compliance with the terms of the license. Section 4 of the legal code lays out those conditions. If I fail to provide proper attribution, I lose the right to make and distribute copies and derivatives. I do NOT lose the right to "steal" the ideas in the article and claim them my own, because those ideas are not subject to copyright, and cannot be made subject to the attribution requirement.
This is where understanding that to the law, attribution is a very specific term of art, which is very different from what we think casually and commonly. Citation is much closer to the way we think than what is enabled in public copyright licenses or, for that matter, private copyright licenses.
This is why we recommend waiving attribution in the Science Commons protocol for open access to data. It's a narrow legal term that can screw with interoperability, while at the same time failing to provide what people really want, which is credit where credit is due.
Puneet Kishor, one of our fellows, got it right. We shouldn't use the law to make it hard to do the wrong thing. We should use technology to make it easy to do the right thing.
When it comes to data, and in particular to data interoperability, enabling citation and provenance that is easy to track and cite will serve the scientific goals far better than an attempt to port open source "principles" into a world where they fundamentally don't fit.
Posted by John Wilbanks at 4:44 PM • 1 Comments • 0 TrackBacks
June 23, 2009
Category:
I'm at the Seed - Council on Competitiveness State of Innovation Summit. I was thinking about live blogging, but find that doing so makes it hard for me to think about what people are actually saying. There's a webcast if you're interested.
As far as conferences go, it's a good one. Rock stars on the stage (E.O. Wilson is a hero of mine) and interesting conversations about innovation.
But I'm frustrated, as I often am at "innovation" conferences. What follows is a bit of a rant directed less at this event, which as I said is a good one, but at the conversation I hear all the time about scientific innovation. There are three problems.
Problem 1: there's almost no conversation about the essential theories of emerging innovation - open, user-driven, distributed. This is about the new forms of innovation that the network enables, and should be on every agenda of every meeting that claims to talk about innovation. If we simply do things the old way, but bigger, we fail. Disruptive innovation models ought to be part of the conversation and they too often aren't.
Problem 2: there's no conversation about technical infrastructure for innovation. Here's what I mean by that: the internet is infrastructure for innovation in culture and commerce. It underpins an enormous amount of economic value, and from it emerges disruption that we could never have predicted, like the Web. And the web in turn begat Google, Amazon, Facebook, blogging, you name it. Both of these systems work this way because they are public systems. Yet we don't talk about an open public technical infrastructure for science. We build individual bits of it, but our vision is investing in unconnected nodes, not networks.
On top of this, there is the assumption that because the web works for culture, it works for science. But the Web is a system built for documents - it's infrastructure for documents. Science innovation depends on data. This conference had a great panel on data, with Ben Fry, who's a data visualization wizard. Yet no conversation that the infrastructure we have for the Web completely fails at data. Infrastructure for making the web function on data is woeful - format standards, annotation, and so on are always underfunded and first to cut in crisis.
Infrastructure for data integration, data federation, and so forth should be encoded directly into the open standards of the web and internet. Full stop. And we should talk about this problem more often. Otherwise people look at their iPhones, check for a latte, and assume this level of functionality scales from coffee to the bench. It doesn't.
Problem 3: there's no conversation about the way that our legal and policy regimes affect emerging modes of innovation. Data use is dependent on legal access to data. There's a range of data regimes across the world that make legal access to data conditional on rights being granted. Copyright licenses prevent innovative scientists from using software to index the literature and integrate it into the database world. Default settings on government policy create strong incentives for patenting smaller and smaller inventions by universities. Tenure and review systems encourage secrecy and withholding.
Taken together, these three problems represent the core "immune system" of science to disruptive change. That's not a terrible thing. Science should resist some disruptive changes. But right now, the disruptive change being resisted is the network. It's a terrible irony that at the moment we have the technical ability to send any content anywhere at almost no cost of distribution, we haven't got the technical and legal infrastructure to realize the potential of that ability for science. It's an even more terrible irony that the innovation resulting from that ability in culture is being constricted by the very policies and regimes we claim to promote innovation.
Posted by John Wilbanks at 12:39 PM • 0 Comments • 0 TrackBacks
June 1, 2009
Category:
I'm happy to say that I'll be doing a forum at the British Library on July 22, called Scientific Findings in a Digital World: What is the Genuine Article? There's a Nature Network group you can join to participate in the creation of the agenda.
This is pretty cool. The British Library is a legendary institution, and has some personal resonance for me too - my dad wrote a big chunk of his dissertation in the reading room there. I'll make a few introductory comments and then do my best Oprah impersonation.
Posted by John Wilbanks at 2:32 PM • 1 Comments • 0 TrackBacks
Category:
Paul Miller and I recorded a chat last week that's now online as a podcast from Cloud of Data.
Paul is a smart guy and it was a fun interview. We first met when he was working with Talis, which is a very progressive company in the UK (they sponsored some of the development of the PDDL and currently host data in the public domain for free in the Talis Connected Commons) but he's now out freelancing. Check out the podcast and let me know your comments.
Posted by John Wilbanks at 9:13 AM • 0 Comments • 0 TrackBacks
May 28, 2009
Category:
So, I was supposed to go up to Montreal and Ottawa the past couple of days, but a series of miserable luck in terms of planes made it unworkable (it's complicated).
Instead, I tried to record a presentation and get it onto the web so we could play it for them, and then take questions by skype. That also didn't work.
However, we were successful in the end getting the video online. So if you're interested in what I say when I talk to the libraries, but haven't been to one of the conferences where I've spoken, take a look.
Posted by John Wilbanks at 3:22 PM • 1 Comments • 0 TrackBacks
May 27, 2009
Category:
As noted on the Creative Commons blog, the folks at Digg have converted to CC0 (replacing a multiyear use of a different public domain legal tool).
This is very cool on lots of levels. But Daniel Burka of Digg said it best, so I'll make this a short post by simply quoting him...
This is good for the internet and good for society.
He's talking about the public domain, and he's right.
Posted by John Wilbanks at 4:10 PM • 0 Comments • 0 TrackBacks
May 20, 2009
Category:
This was in the comments from my blog post on Pfizer's semi-open innovation. I don't normally highlight comments like this, but sometimes you have to give credit where credit is due.
Why deal with Pfizer in the first place? Anything you might find they'll keep and you're SOL. We have a compound library that started from 1.4 million cmpds from Chemdiv, Chembridge, Maybridge and Tripos. I talked them into using our exclusion criteria (developed by my old buddies from Pharmacia - we all got Pfired when Pfizer took over Kazoo) and got rid of all the junk we didn't want (1 million). From there we used a "molecular equivalence" program to pick only unique compounds that we wanted to purchase - 100,000. I built my own library of off-patent FDA approved drugs (except opiods and benzo's). You can come to us with your screen and run it against our library of 10,00 (select set) or 100,000 full set and you get to keep whatever you find. No IP issues. Check us out.
This is why I love blogging. The writer is from the Michigan High Throughput Screening Center. And I think I've found a poster child for the Health Commons.
Posted by John Wilbanks at 9:49 AM • 1 Comments • 0 TrackBacks
May 19, 2009
Category:
I ran into Virginia Acha last week at the NESTA event in London, but she didn't tell me about this!
Derek Lowe at In the Pipeline notes that Pfizer is apparently allowing external companies to screen against their internal library.
But I'm told that Pfizer has been meeting with several other (mostly smaller) companies, offering their (entire?) compound library as a screening resource. As I understand it, you need to come to them with a reasonably formatted HTS assay, and there's a fee in the high hundreds of thousands to run the screen.
This isn't all the way towards open innovation. In a true open innovation sense, the fees and the barriers would be lower, as the goal would be to maximize dealflow. But that probably means, as Derek also notes, that the IP issues aren't settled.
Should anyone involved want to talk about how to settle those issues, we'd be interested in hearing about how this process is working out.
Posted by John Wilbanks at 6:12 PM • 2 Comments • 0 TrackBacks
Category:
Open Knowledge Foundation have released a short guide to open data as part of the open data commons project.
I have my philosophical disagreements with OKF on some issues - and they with me! - but they're the kind of disagreements that come from people on the same side of the fence. We all want open data, and we want it now.
Moments like this are good to step back and focus on our agreements. We agree that data is a little weird, and that we need more research on how to best treat the law around the data. We agree that public sector information needs to be free - in fact, Rufus Pollock has written some essential work on the subject. And I'm proud to serve on the OKF advisory board (here's hoping they're as glad to have me).
So let this be a word of warning to those who think there is a "split" on open data - there isn't. There may be a lot of passionate back and forth, but it's a matter of degree, not of difference. Congrats to the OKF for the Open Data Commons project and the beginnings that the guide represents.
Posted by John Wilbanks at 12:52 AM • 1 Comments • 0 TrackBacks