Common Knowledge

My last posts on why I don’t like the open source metaphor for science have generated a lot of good comments, here and in my email, twitter, and in person.

They’ve forced me to think about what exactly it is about the meme that makes me so uncomfortable, and raised some good objections and points. I’m going to try to chew through a few of them in this post and then ditch the topic for a while, as I’ve got a lot of complaining to do about publishing and data and those topics have had to take a back seat for a few weeks while I worked this through my system.

On a side note, I actually kinda felt like a real blogger the last few weeks.

I guess for me the open source metaphor is so tied to software that its applicability as a metaphor is limited. I have done a very informal, personal, anecdotal, but multi-year, survey of people I talk to on this topic. For most people “open source” is an idea seen through a glass darkly, a vague mishmash of ideas of political freedom, distributed development methodology, and magical legal tools.

“We need open source [insert variable] to do [insert task currently performed by big evil company]” is almost an algorithm of faith in my world. I hear it again and again, marked by almost no understanding of the context in which open source software actually exists and operates. This is what I’m on about. Open source isn’t a magic incantation we can use to summon a community and create a public good.

Open “source” to most who use the metaphor is so much more than “the source is available” that we need to do some pushing back against it, even as we must also celebrate the intentions behind its use.

I’m going to make a few attempts to untangle the mishmash.

First, open source came from Free Software. If you haven’t read through the histories of Free v. Open, please go do so. But I would loosely generalize that Free is more about Freedom, of programmers, of speech, and of society, whereas Open Source is more about a development methodology embracing distribution of tasks and interconnectivity of outputs. They have a lot in common, but they’re not the same.

Second, both Free and Open Source depend upon a public approach to copyright, which is the open copyright license. The existence of a powerful, relatively internationally harmonized property right is absolutely essential to the entire open source enterprise. Another key point in copyright is that the creator-programmer owns all her rights necessary to license those rights (absent signing them away to a company or other institution in a contract, of course). If she writes code, she owns it, without applying to a central authority, for a hell of a long time.

This power is at the root of the power of the license. It cannot be understated, and I’ll come back to it later, because the absence of such a right that works this way is a central flaw to the naive application of the metaphor in science.

Third, open source software hasn’t changed the world just because it was free, or openly licensed. It sits on top of an infrastructure that was highly leveraged to support something like open source – the internet stack, the explosion of microcomputers, the magic intersection of moore’s and metcalfe’s and joy’s laws, the democratization of network access, and more. And on top of all of this was also the explosion in programming tools, object orientation, and modularity of software design.

Let me phrase it as a question. Would the four freedoms and the GNU GPL have been sufficient to create an explosion of free and open source software in the mid-60s? Pre-internet, pre-web, in the days of mainframes and timesharing and tiny memory and machine code?

I would propose the answer is no.

These three elements are poorly represented in science. We have some desire for the first issue – starting with Freedom. That’s probably the most advanced. And that’s why the open source science movement starts with appropriation of language and metaphor from software. I understand it. I support the ideas behind it. However I think it blinds us to the things that block the intentions from being realized, which are many. I’ll expand on two here.

First, the legal basis for open licensing in science is not simple, powerful, and internationally harmonized.

Science creates at least four classes of knowledge artifacts at their most basic level: creative works (whether in a journal or a webby form like a blog, whether narrative or photo or video), data (whether “raw” or processed), databases (which are different from data, and may contain creative works as well as data), and inventions (which may or may not be patented). Each of these four classes carries its own often wacky property rights regimes, some of which are amenable to open source style licenses, some of which aren’t, and some of which it’s utterly unclear if open source will work or not.

To make things worse, science takes place in institutions. That means institutional claims on property rights. Institutions have offices set up specifically to exploit property rights, not share them. And even if you can get the institution on board, the creator usually does not own all the rights necessary to make the kind of freedom available – remember, this all starts with Freedom – as we contemplate in the open source metaphor. Worse yet again, getting the property right associated with inventions (patent) costs a ton of money – so giving it away as soon as you get it is much harder as a value proposition than in copyright, which descends from the heavens when the pen lifts from the paper.

Patents and copyrights don’t mix beautifully, either. If I own a copyright on a gel box design, I can release the design and “make the source code available” – but if my neighbor owns a patent on it, that neighbor can sue anyone who tries to actually build the gel box. This is something of a problem in software. But it’s a massive problem in science. Especially life sciences, which are built on patents as proxies for economic value and create enormous employment opportunities for attorneys as a result.

Data and databases are another place where the underlying property regimes don’t work as well for open source as in software. But that’s difficult enough to merit its own post. Suffice to say if Open Data had a facebook page, its relationship status with the law would be “It’s Complicated.”

The second block is the insufficient mix of infrastructure. We can make creative works available, we can post data, we can license (maybe) inventions, we can integrate databases. But stitching it all together is the hard part. It is hard to compile four classes of knowledge products, much harder than compiling software. And the open source metaphor again builds the expectation that if we “make it open” that we’ll get a magic network effect, that wikipedia will emerge for science.

But the infrastructure for software isn’t strong enough to stitch together science knowledge. Most science knowledge is locked up in PDF and Word formats, lacks hyperlinks, or in standalone databases. It’s not “modular” in the sense that software is, even though it’s just as socially constructed as software in its own way. We’ve designed science knowledge for a human operating system, not a computerized one.

This is why I’m semi-obsessed with building linked data infrastructure, semantics, ontologies, and so on. It is going to be essential to realizing the intentions behind the open source metaphor – that knowledge connected becomes more valuable than the sum of its parts, that many of us can work separately on the same task and create a common good. We’re in the pre-internet world of science, metaphorically, and we need to build the networks and the protocols first, we need the machines to get cheaper and ubiquitous, we need common languages for data and concepts – then we can start talking about a free software metaphor being accurate.

I’m not beating up on the metaphor because I hate the idea. And if I can find evidence that people are using the metaphor in full understanding of the realities between here in science, and there in open source science, I’ll dial it back.

But so far I haven’t found that evidence. And I think propagating the open source metaphor – without a hard-eyed examination of the barn raising we have to do before we get anything as transformative for science as GNU/Linux has been for software – risks hiding the hard stuff and creating unrealistic expectations that could boomerang on us all.

Comments

  1. #1 Dorothea Salo
    December 3, 2009

    Er, you forgot to put that last paragraph in 100-pixel blinking bold bright-red type, John. ;) I’m sure it was only a momentary oversight.

  2. #2 D. C. Sessions
    December 3, 2009

    And if I can find evidence that people are using the metaphor in full understanding of the realities between here in science, and there in open source science, I’ll dial it back.

    Bear in mind that the “software libre as science” metaphor appeals to science as taught in history books, and to some extent as practiced in earlier times — not institutional science post Dole-Bayh.

    One might argue that at the same time software and some other creative work is moving towards a model familiar to Benjamin Franklin, the physical sciences are moving closer to a model familiar to Bill Gates.

    NB: I gather from $DAUGHTER that the social sciences have not yet become so commercialized as the physical sciences, possibly because they haven’t found a way to patent historical or sociological work.

  3. #3 john wilbanks
    December 3, 2009

    If I could applaud a comment furiously, it would be comment #2. It took me far too many bits and electrons to make an argument summarized in two sentences.

  4. #4 D. C. Sessions
    December 3, 2009

    It took me far too many bits and electrons to make an argument summarized in two sentences.

    Those two sentences only make sense as commentary on the longer essay. On their own they’re incomprehensible.

    If, on the other hand, you want to make them your own, go for it.

  5. #5 foobar
    December 3, 2009

    Generally, I find this discussion rather pointless. From what my position is in this whole research process, the most sensible position is “just do it”.
    That being said, I want to remark on one of your arguments: It is not true that giving away knowledge that would be eligible for patenting costs a load of money. If an invention is released to the public without being patented, it goes into the public domain. No one else will ever be able to patent it, since there’s prior art of it. So releasing that kind of knowledge into the public domain is a simple matter of implementing/documenting it somewhere for everyone to see.
    I do see how this could clash with insitutions’ IP policy. But that, I think, is exactly one of the points the advocators of Open Source Science are trying to make: Getting institutions to regard the releasing of knowledge into the public domain as something good and desirable.

  6. #6 john wilbanks
    December 3, 2009

    I think you’re missing the whole point here. Using the phrase open source implies property licensing, not public domain. Open science? Distributed science? Perfect. They’re big enough terms to encompass what we’re talking about.

    As for the public domain patent thing…well, yeah, I’m pretty aware of the prior art aspect of giving knowledge away. It’s come up once or twice :-)

    I am in this post, however, not saying it costs money to give away inventions that you don’t patent. I am instead making an utterly different argument, which is that copyrights are free (cash wise) to acquire, and thus don’t cost you money that you spent when you license them for open purposes. If you actually want a patent, it is going to cost you $50,000 and up, which makes you less likely to actually then turn around and give it away.

    This is why most of the “open patent” initiatives have had so much trouble achieve scale. Companies don’t spend $50K per patent and up in order to donate them.

    If you opt out of the property system, that’s obviously not a cost you bear. It’s also unrelated to the argument I was making. If we lived in a culture of free revealing of scientific knowledge, we wouldn’t be making the arguments here. It’s much closer to the political ideals of Free Software actually.

    Opting out of the property system is absolutely open, and to be encouraged in academic science. But it’s not “open source” – and I’d feel a lot better if advocates of open source science actually understood that point.

  7. #7 Jonathan Cline
    December 4, 2009

    . Would the four freedoms and the GNU GPL have been sufficient to create an explosion of free and open source software in the mid-60s? Pre-internet, pre-web, in the days of mainframes and timesharing and tiny memory and machine code? I would propose the answer is no.

    It would be better if you drop the term open source from your discussion. PUBLIC DOMAIN software and science and research is the foundation of collaboration and Long Tail work. The new name for PUBLIC DOMAIN is “Open Source” with a lot of additional fluff added.

    This is the complete definition of Open Source:

    MIT License for Software (circa 1992?)

    Copyright (c) [year] [copyright holders]

    Permission is hereby granted, free of charge, to any person
    obtaining a copy of this software and associated documentation
    files (the “Software”), to deal in the Software without
    restriction, including without limitation the rights to use,
    copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the
    Software is furnished to do so, subject to the following
    conditions:

    The above copyright notice and this permission notice shall be
    included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
    OTHER DEALINGS IN THE SOFTWARE.

    There are many engineers and scientists throughout history who have openly (Public Domain) shared their work to accelerate collaboration. Discussion of “the technology field wouldn’t be as successful with/without GPL” or “the technology field wouldn’t be as successful with/without Moore’s/Seagates Law” is a completely separate issue. Public Domain collaboration works. It allows everyone to contribute, to improve, to keep the ball rolling. Whereas, Proprietary collaboration causes vendor lock-in, obfuscation, mothballing, excessive lawyer fees due to patent issues, and secretive “advances” (which may or may not be actual technology advances).

    The point you reiterate, “We need open source [insert variable] to do [insert task currently performed by big evil company]“, is the naive argument. It is better left ignored. In truth, there are valid places for Public Domain (i.e. open source) technology and for proprietary technology, as well as hybrid solutions like dual-licensing. There can also be a valid case made against viral-source like GNU GPL, because the viral terms prevent commercial improvements (commercial entities have a difficult time swallowing the poison pill of GNU GPL which forces all modifications to be released back to the Public).

    Most of this is covered in Eric S. Raymond’s books. The benefits of open collaboration in any field is incredible.

    There has been enormous growth in software and technology due to open collaboration without, and before, GNU GPL. The GNU GPL came “later” to many. More software is released under BSD terms (completely open to use, modify, sell; more like Public Domain) than any other software.

    My conclusion would be this: your perspectives may have been clouded by imprecise definitions of what “open source” is, and how important GNU GPL is, perhaps by those same types of people who state “We need open source [insert variable] to do [insert task currently performed by big evil company]“. Look at the broader field of software and technology for a better picture.

  8. #8 John Wilbanks
    December 4, 2009

    I’ve read Raymond’s books, and studied open source as well as free software extensively for more than 10 years. I’m speaking in these posts to people who haven’t – who are misapplying the metaphor as a proxy for a vague desire for distributed, open development.

    I’m not attacking the idea of distributed, open development. I’m attacking the vague mishmash as a way of thinking, and in particular, I’m attacking the very real problem of unmet expectations that the open source metaphor brings with it.

    If you look at the work I do in my day job, you’ll see a distinct propensity to advocate for the public domain over licensing options. It gets me in trouble with people who think the PD is the same as the BSD. You’ll also see that I don’t buy into “viral” licensing as a cureall.

    But I have run into a number of people promoting “open source science” who think it’s a matter of dropping some data online, or putting a chem structure into the public domain, or getting X number of registered users (as if this were Second Life) and then wham! Free drugs for everyone!

    I think this obscures the infrastructure development on which our energies should be focused, and risks harming the overall cause that I share with those who promote open source science: making science work better and faster through distribution of tasks, separation of concerns, modularity, and free revealing.

    Calling this “open source” hides the complexity behind it, and obscures the very, very, very real barriers to distribution, separation, modularity, and revealing that are present in science and not in software.

  9. #9 Josh Perfetto
    December 4, 2009

    I think you made a lot of valid points about the difficulties to be overcome before the successes of the open source software movement can be realized in the life sciences. But I also think this discussion is turning into one mainly about terminology.

    When someone says “open source software”, you read between the tea leaves of everything they might mean, but when someone says “open source science”, you take the term quite literally, and say that they should have said “distributed science”, “open science”, “user driven innovation”, or something like that.

    The real problem, as you mentioned, is that the right property rights regime and infrastructure/stack that enabled the success of open source software don’t yet exist for science. The people talking about open source science are trying to create that and replicate the spirit of the open source software movement. Their use of the term doesn’t mean that “open source science” is as mature as “open source software”.

    If one were developing a new term for this, given the nature of the chemical and biological sciences, one would probably not use the word “source”. But given the history of open source software, “open source” is a convenient way to refer to a concept. Either the term will stick, or a more precise term will evolve as the practical issues get straightened out. Doesn’t really matter, what matters is getting there. And yes, I agree with you that it will not just magically occur :)

  10. #10 sikiş
    December 5, 2009

    For some features of the Site, you may be able to submit information about other people. For example, if you wish to e-mail an article from the Site or send an e-postcard to a friend, you will submit.

  11. #11 osmanlı iksiri
    December 29, 2009

    This is why most of the “open patent” initiatives have had so much trouble achieve scale. Companies don’t spend $50K per patent and up in order to donate them.

  12. #12 sac kesim modelleri
    January 1, 2010

    thank you for this post..
    it s amazing knowledge for me..
    thnx and regards