My last posts on why I don’t like the open source metaphor for science have generated a lot of good comments, here and in my email, twitter, and in person.
They’ve forced me to think about what exactly it is about the meme that makes me so uncomfortable, and raised some good objections and points. I’m going to try to chew through a few of them in this post and then ditch the topic for a while, as I’ve got a lot of complaining to do about publishing and data and those topics have had to take a back seat for a few weeks while I worked this through my system.
On a side note, I actually kinda felt like a real blogger the last few weeks.
I guess for me the open source metaphor is so tied to software that its applicability as a metaphor is limited. I have done a very informal, personal, anecdotal, but multi-year, survey of people I talk to on this topic. For most people “open source” is an idea seen through a glass darkly, a vague mishmash of ideas of political freedom, distributed development methodology, and magical legal tools.
“We need open source [insert variable] to do [insert task currently performed by big evil company]” is almost an algorithm of faith in my world. I hear it again and again, marked by almost no understanding of the context in which open source software actually exists and operates. This is what I’m on about. Open source isn’t a magic incantation we can use to summon a community and create a public good.
Open “source” to most who use the metaphor is so much more than “the source is available” that we need to do some pushing back against it, even as we must also celebrate the intentions behind its use.
I’m going to make a few attempts to untangle the mishmash.
First, open source came from Free Software. If you haven’t read through the histories of Free v. Open, please go do so. But I would loosely generalize that Free is more about Freedom, of programmers, of speech, and of society, whereas Open Source is more about a development methodology embracing distribution of tasks and interconnectivity of outputs. They have a lot in common, but they’re not the same.
Second, both Free and Open Source depend upon a public approach to copyright, which is the open copyright license. The existence of a powerful, relatively internationally harmonized property right is absolutely essential to the entire open source enterprise. Another key point in copyright is that the creator-programmer owns all her rights necessary to license those rights (absent signing them away to a company or other institution in a contract, of course). If she writes code, she owns it, without applying to a central authority, for a hell of a long time.
This power is at the root of the power of the license. It cannot be understated, and I’ll come back to it later, because the absence of such a right that works this way is a central flaw to the naive application of the metaphor in science.
Third, open source software hasn’t changed the world just because it was free, or openly licensed. It sits on top of an infrastructure that was highly leveraged to support something like open source – the internet stack, the explosion of microcomputers, the magic intersection of moore’s and metcalfe’s and joy’s laws, the democratization of network access, and more. And on top of all of this was also the explosion in programming tools, object orientation, and modularity of software design.
Let me phrase it as a question. Would the four freedoms and the GNU GPL have been sufficient to create an explosion of free and open source software in the mid-60s? Pre-internet, pre-web, in the days of mainframes and timesharing and tiny memory and machine code?
I would propose the answer is no.
These three elements are poorly represented in science. We have some desire for the first issue – starting with Freedom. That’s probably the most advanced. And that’s why the open source science movement starts with appropriation of language and metaphor from software. I understand it. I support the ideas behind it. However I think it blinds us to the things that block the intentions from being realized, which are many. I’ll expand on two here.
First, the legal basis for open licensing in science is not simple, powerful, and internationally harmonized.
Science creates at least four classes of knowledge artifacts at their most basic level: creative works (whether in a journal or a webby form like a blog, whether narrative or photo or video), data (whether “raw” or processed), databases (which are different from data, and may contain creative works as well as data), and inventions (which may or may not be patented). Each of these four classes carries its own often wacky property rights regimes, some of which are amenable to open source style licenses, some of which aren’t, and some of which it’s utterly unclear if open source will work or not.
To make things worse, science takes place in institutions. That means institutional claims on property rights. Institutions have offices set up specifically to exploit property rights, not share them. And even if you can get the institution on board, the creator usually does not own all the rights necessary to make the kind of freedom available – remember, this all starts with Freedom – as we contemplate in the open source metaphor. Worse yet again, getting the property right associated with inventions (patent) costs a ton of money – so giving it away as soon as you get it is much harder as a value proposition than in copyright, which descends from the heavens when the pen lifts from the paper.
Patents and copyrights don’t mix beautifully, either. If I own a copyright on a gel box design, I can release the design and “make the source code available” – but if my neighbor owns a patent on it, that neighbor can sue anyone who tries to actually build the gel box. This is something of a problem in software. But it’s a massive problem in science. Especially life sciences, which are built on patents as proxies for economic value and create enormous employment opportunities for attorneys as a result.
Data and databases are another place where the underlying property regimes don’t work as well for open source as in software. But that’s difficult enough to merit its own post. Suffice to say if Open Data had a facebook page, its relationship status with the law would be “It’s Complicated.”
The second block is the insufficient mix of infrastructure. We can make creative works available, we can post data, we can license (maybe) inventions, we can integrate databases. But stitching it all together is the hard part. It is hard to compile four classes of knowledge products, much harder than compiling software. And the open source metaphor again builds the expectation that if we “make it open” that we’ll get a magic network effect, that wikipedia will emerge for science.
But the infrastructure for software isn’t strong enough to stitch together science knowledge. Most science knowledge is locked up in PDF and Word formats, lacks hyperlinks, or in standalone databases. It’s not “modular” in the sense that software is, even though it’s just as socially constructed as software in its own way. We’ve designed science knowledge for a human operating system, not a computerized one.
This is why I’m semi-obsessed with building linked data infrastructure, semantics, ontologies, and so on. It is going to be essential to realizing the intentions behind the open source metaphor – that knowledge connected becomes more valuable than the sum of its parts, that many of us can work separately on the same task and create a common good. We’re in the pre-internet world of science, metaphorically, and we need to build the networks and the protocols first, we need the machines to get cheaper and ubiquitous, we need common languages for data and concepts – then we can start talking about a free software metaphor being accurate.
I’m not beating up on the metaphor because I hate the idea. And if I can find evidence that people are using the metaphor in full understanding of the realities between here in science, and there in open source science, I’ll dial it back.
But so far I haven’t found that evidence. And I think propagating the open source metaphor – without a hard-eyed examination of the barn raising we have to do before we get anything as transformative for science as GNU/Linux has been for software – risks hiding the hard stuff and creating unrealistic expectations that could boomerang on us all.