Distributed Science, Part 2

By jwilbanks on November 5, 2009.

I got a lot of feedback on my last post in which I argued that open source is the wrong metaphor fo science, because it ties us too closely to the artifact that is open source software. The core of my argument remains the same - science is not software, and we shouldn't treat it the way we treat software. But I got a few comments, here on the blog and in email, that are worth looking at.

Here's comment #1.

You cite openwetware and the biobricks registry, but if you look closer, openwetware is a wiki, not a website about open source wetware tech. To my knowledge, other than the people over at diybio, there have been no signs of anyone with an understanding of free and open source software infrastructure (not the legalese- the toolchains) applying the concepts to the world of open source science.

This comment illustrates my point by missing it, which is that we should not be applying the understanding of software to science. In software, we the humans are in charge. We write the code. We compile it. Everything exists inside a system that we built, that is at least somewhat intelligently designed. Bringing this "understanding" to science means we shove a science peg into a software slot. The idea that "open source science" should be a site about wetware tech betrays a focus on the construction of tech, which is indeed the point of software.

But science isn't like software. Science is about extending the boundaries of our ignorance, not making technology. The difference between making technology (which is the point of software) and making discoveries (the point of science) is the root of the failure of the "open source science" metaphor. Science is about creating knowledge that doesn't exist and exposing ignorance that does exist, not about writing source code that we control.

In honor of his recent passing, here's Claude LÃ©vi-Strauss: "The scientist is not a person who gives the right answers, he's one who asks the right questions." (from Le Cru et le cuit, 1964)

This is precisely why I want to take us up a layer in the ontology. Open source software is an example of distributed innovation, and as an inspiration to make distributed innovation happen in science, it's lovely. But it's an inspiration, not a map.

We should absolutely have distributed innovation in science. Open WetWare (which I am well aware is a wiki) contains many protocols, crafts and techniques, that are shared openly. This is a locally relevant form of distribution, even if it doesn't fit into an open source software box. Control over protocols and craft is at the core of one of the biggest resistors to distribution in science, which is competitive withholding. So is the registry of standard biological parts. These are resources and toolchains that absolutely support distribution of capability and increase capacity, which are fundamental to early-stage distributed innovation.

They're just not what we expect when we wear open source glasses.

Here's comment #2:

The "Open Gel Box" project is an initiative to bring biotech equipment into the 21st century. We need innovation in "established" tools to make them intuitive and accessible for anyone who wants to work with DNA. To that end, a group of users from the DIYbio list got together and designed a better, faster gel system than what exists today.

Pearl Biotech is now manufacturing a complete gel electrophoresis system according to the Open Gel Box design The Pearl Gel Box is available for $199 at http://www.pearlbiotech.com. We're advocating for better equipment on all fronts, such as an Open Thermal Cycler.

I think this is awesome. It's not "open source" though. It's not even what I'd call "distributed innovation" - the innovation theorists call this kind of thing User-Driven Innovation. This is about as clear a case of UDI as I know, right down to the fact that it's designed by the DIY folks and then made pretty and sold by a company. This again gets to the paucity of the open source software example. It simply isn't big enough to fit science into it.

Distributed science, user-driven science, open innovation science, we need ALL of them, not a narrow idea that comes from software. It's about hardware for science. It's about data for science. It's about laboratories for science. It's about research departments and funders and promotion and tenure. It's about paradigms, and paradigm shifts.

It's not software.

We control software. We don't control science. DIY Biology is one of the absolute leading examples of how, when we have a critical mass of open craft and protocols, users can lead the way. But it's not something that's enabled by an open source license, a code version repository, and other hallmarks of open source software. It's users saying, "screw this, I can do better" - and doing it. It's users who know the problem best and design the best solutions.

The business school folks call this "stickiness." The knowledge of how to make the solution is localized - sticks - to the user. The dumb firms in the sector only make products their marketing departments tell them about, and the smart ones find ways to take user inventions and turn them into their product lines. Like Pearl.

Comment #3:

(from my post: Stem cells, mice, vectors, plasmids, and more will need to available outside the old boy's club that dominates modern life sciences.)

This is simply never gonna happen, because of the huge irreducible expense of maintaining and manipulating these reagents.

See: Personal Genome Project, Coriell Cell Culture Repository, Jackson Laboratories, StrainInfo. I could link a dozen more. The nodes are emerging. What's missing is the network that connects them. What's missing is an impact factor for materials.

We're headed straight towards a future where scientists will need to publish their tools, data, and narratives, instead of compressing everything into a "paper" that is constrained by the cost of printing and mailing. I for one can't wait. It's going to be a key to distributing democratized access to tools, which is fundamental for both distributed innovation *and* user-driven innovation.

Comment #4:

I believe your historical facts are a little skewed. Open Biology perhaps began on the internet back with BIONET, which functioned well through the late 80's and early 90's, until the network apparently failed to grab sufficient interest for funding. [...] There have been efforts to create biology software repositories (similar to sourceforge.net except for Biology software) and these have largely failed to attract a majority of Bio-scientists too.

This comment's talking about software. I'm not. It again illustrates the way that the open source metaphor comes with code-centric blinders.

It would be great to accelerate this process even further, for example by expanding PLoS, encouraging all scientists to publish their working software (for example, MATLAB scripts) into open source repositories

Now this is talking about the foundations for distributed science. When there is software in science, it should be published. Just like stem cells. Into repositories. Couldn't agree more.

encouraging the people-in-the-middle (hobbyists, engineers) to publish in an intermediate form which isn't as strict as a scientific journal yet maintains some level of technological standard and legitimacy -- similar to the Internet RFC's, which started as simple technical memo's.

Now here's where the comment truly shines, IMO. This is thinking broadly about breaking open the central metaphor of knowledge governance in science. This is not about "open source" - the internet RFCs aren't "open source software" - they are protocols, distributed for implementation and comment. Sort of like that stuff on the Open WetWare wiki, huh?

Coming back to my point.

Let's take off the open source glasses. Making science isn't like making software. Engineering foundations for distribution, for user hacking, for bringing more people into the system, these are the things that allowed open source to emerge in software. Good design choices, like separation of concerns, led us to the world of open source software. Let's learn from those lessons and build the foundations first, and let the science surprise us with the way it localizes distributed and user driven innovation.

More like this

Under the open access philosophy, Redalyc looks forward to contribute to the scientific editorial work produced in and about Iberoamerica, making available for the students and researchers the content of more then 550 magazines from different knowledge areas.

I realize that I should have held my comment for this post, but if you're interested, it's #5 on the previous post.

The foundations are so essential to engineer and create; I wonder how many researchers are willing to participate in the process of doing so. So few people see the benefit of establishing universal ontologies when their grad students and postdocs are churning out data for what "might" be the next big breakthrough in cancer biology, etc. So few have time to work on bringing the knowledge into a public forum where the masses can more rapidly participate. I use the phrase "the masses", but even programming, as well as science requires a specialized skill set, and therefore the crowd will be relatively small.

Science Commons is an excellent start, and we need more participation to create the kind of quorum necessary for really pushing the envelope in this area.

Rick Smith
Twitter: @h2oindio

PS - Are there any agencies funding experimental distributed science? I am trying to convince some PI's in my department to get in on a project that is inherently distributed, but there is resistance w/o funding. :)

These are great points. Science is not software, though some science may be enabled by software made broadly available. It's like trying to extend how car keys are distributed to how to drive a car. Keys are a typical aspect of driving, but not particularly a useful guide to driving.

so what makes you think open source software is only about software?

I hear your message, and I think it's an important one: what worked for Linux and Apache and Python and Wordpress might not work for plasmids and PCR and genotyping. However, and here I could offer to help by giving you a copy of my book.... oh wait...;) ... open source/free software is not *only* about software. It's about: 1) making sharing into an easy to follow practice 2) fighting to define what counts as an open infrastructure 3) writing creative legal tools that promote freedom 4) inventing clever new forms of governance that take advantage of the internet and distributed communication and 5) fomenting a movement in which one discusses 1-4 in detail.

I think science commons (and creative commons before that) *started* from the recognition that open source wasn't just about software. I'm not so sure you could convince me that these practices are somehow intimately tied to the essential nature of software, even if you could convince me that "software" is one thing.

But I also have a related objection: since when is science about discovery and software about control? Synthetic Biology (and Biobricks in particular) is explicitly about control--their whole raison d'etre is to create a bioengineering infrastructure that allows for scalable control of parts and components made from living things. If you buy this, then your argument suggests it isn't science, it's software. But that's silly, of course it's science--and it's especially science when the *inability* to control something *leads to an insight about biological systems*. Ditto nanotechnology.

Conversely, I can't imaging many computer scientists would be happy to hear you suggest that their work is not science since it takes the form of software(-as-proof). Of course they do science--all kinds of things in computer science are about creating new knowledge and exposing things we don't know... and we do that by exploring the limits of logic, software and hardware.

Perhaps that nitpicking, but the philosopher of science in me suggests that your distinction will get in the way of your msg. Perhaps it's better to specify more clearly the kinds of science that are not amenable to things like modularization, granularization and distributed cognition, as opposed to suggesting that science v. software is the relevant distinction. I've found, for instance, that there are some cultural differences between chemists and computer scientists that make a difference (chemists are happy writing in Microsoft Word, computer scientists insist on Latex...a lot of consequences follow) and might translate into other practices as well.

At one level that's just a plea to be more careful about these distinctions. But at a deeper level, there is a question about what really makes science tick today, and whether that is different than what made it tick 40 years ago. The nature of science is not eternal...

(you should know that when you quote Levi Strauss, the anthropologists come out of the woodwork :)

Beautiful comment, and I won't dive too far into it because it's on target.

My two quick responses, though. I'm not talking about computer science as a field. Or hardware design. I'm specifically talking about open source software, which is vastly built outside the fields of computer science. It might be used in computer science, but that's a different thing. It depends on modularity and crowds and incremental improvements. This isn't computer science, at least, not the computer science I see practiced at MIT and elsewhere.

Second point is to biobricks - it's absolutely more similar to software. This is precisely why I pulled it out as an example of distributed innovation emerging. It's of course also not software, which is why my plea is to call it a distributed science.

Great comment though!

jtw

Illuminating posts (this and the previous one) John, thank you.

At the Genomics Law Report I've attempted to heed your message and examine yesterday's Complete Genomics announcement through the distributed science lens: Completing the Personal Genomics Toolkit.

Feedback welcome. Thanks.

- Dan

OK, but: TeX, httpd, the internet protocols, UNIX, lex, yacc, basically all the programming languages, even emacs at some point in its career (one could name many more) began life as computer science projects--proof of concept innovations that indeed proved themselves. One might quibble that these are computer engineering rather than computer science, but that's a pretty scholastic distinction these days. What's even more interesting is the way some kind of non-institutional (i.e. hacker-created) programs have generated new research fields, only the most obvious are peer to peer overlays, which were not created in computer science departments, but which are now exuberantly explored in computer science (and result in new open source tools for doing so).

I think I'd rather see you theorize a threshold with totally non-software driven science on one far side, totally use-driven software development on the other far side, and a big gradient of science/software/theory in the middle.

SC's work in mining the literature of neuroscience, for instance, is in that gradient: some neuroscientists will keep on working with rat brains, some will work with MRIs, but some will start combing that densely interconnected literature (if we succeed in making it talk) for new knowledge...

don't go on a rampage against "the software metaphor"-- at least make it more specific than that... identify the problem more precisely.

See: Personal Genome Project, Coriell Cell Culture Repository, Jackson Laboratories, StrainInfo. I could link a dozen more.

And if any of them were remotely useful to anyone who doesn't also have an old boys' club pass, you'd have made your point. But they aren't, so I'm with Physioprof on this one. See also.

Obviously, I disagree regarding the old boys' club.

By definition, an old boys' club is one in which one's social network is the source of control. By putting the materials into repositories where they are available for the cost of manufacture and distribution to any scientist at a university who's been willing to sign a simple agreement with the repository, the old boys' club is beaten.

Compare that to the existing system in which one has to email or call the lab and beg. We don't get from that to ebay in one step, not in reality. These places make materials available in essentially one click to qualified institutions, and they don't put obnoxious terms on the institutions.

The stem cells might not be available to you and me as individuals for garage biology (yet) but I don't think that intermediate step of institutional affiliation is that terrible. If a scientist at East Tennessee State - or FIOCRUZ in Brazil, or a Yorumba-speaking scientist in West Africa - has the same terms of access as a scientist at Harvard, for me, that's beating back the old boys' club.

Ah -- in fact we agree more than disagree, the problem is sloppy phrasing on my part. I should distinguish two groups, the old boys' club as you mean it (where a self-styled elite retains power by gatekeeping), and the larger club of professional scientists -- most of whom don't happen to have any Harvard buddies.

I had thought that what you meant by old boys' club was institutional affiliation, and that you were pushing for garage bio. I am not in favor of any kind of garage biology, as per my link above. But I am strongly in favor of what you call the "intermediate step", where all professional scientists have equal access.

What separates the Open Gel Box project from Open Source? The specification and schematics are publicly available, released under an open license.

What separates Open Gel Box from open source software is that it is Not Software.

While one is hardware and the other software, what are the differences in the metaphor? Perhaps the subjects themselves may differ, while the principles of approach remain constant.

I think you have a very narrow view of what "software" means, and that you have failed to adequately define it.

One very broad definition of software is anything that could be encoded and stored on general purpose data storage media. By that definition wikis are software, pdfs of peer reviewed papers are software, and the Open Gel Box specifications and schematics are software. When you print an article or build a gel box you generate a hardware instantiation of that software. (One could argue that ink on paper is a general purpose medium and that a copy of an article on paper media is similar to a copy on a magnetic drive; it is a symbolic encoding of language and ideas. But I digress)

I think the definition you are using is something more like 'code in a programming language intended to be used as instructions for a computer'. I'm not entirely certain if that's what you meant exactly, but it is a pretty arbitrary and non-standard definition.

I know of instances of version control systems being used for text documents. If I keep external text files of code documentation in my VCS, is that software? What if I use a VCS to collaboratively write a science fiction novel, is the text of the novel software? If I use a VCS as the storage back end for a web based wiki, is that software? If I then export the contents into a more traditional wiki is that software? I assume not since you seem to believe wiki contents aren't software. But does that mean that text in a VCS isn't software if it gets displayed in a wiki? If a wiki has code snippets and bits of explanation as a teaching resource is that software? If I make all of the explanatory bits comments in a source code listing and keep the whole thing in VCS as self-documenting code is that software? What about the many web frontends to code repositories? Is it software if it is presented in a browser for a human to read?

My point is that you can make a smooth transition from something hosted on sourceforge or github through code tutorials to CS textbooks to fictional novels to personal letters. And for any distinction you care to name (backend storage technology, machine-readable versus human-readable, purpose on an entertainment to application spectrum, etc) one could find an example to confound the definition.

I'd also note that comment #11 asked about "open source" and you responded to a different question about "open source software". Why does "open source" have to be about software in your view? And if the specifications were developed in an open source fashion would it not make sense to call hardware built from open source specs "open source hardware"? The specs needed to be sent off for manufacture. So what? When you print a pdf you have to send the bits off to the printer for manufacture. What is the difference? If manufacturing becomes more automated would Open Gel Box then qualify for the name "open source"? What if I were able to build one with a CNC water jet and a PCB mill controlled by my desktop PC the same way I print a pdf. Would that be open source? Would it be software?

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

World Opera, Collaborative Science, and Getting On The One

March 3, 2011

(blows off the dust since the last entry) (Life trumped blogging; my first child was born in March) Just before I went into the parent tunnel, which is awesome by the by, I attended a seminar conducted by Niels Windfeld Lund, General Manager of the World Opera. Not my usual event. But music's…

Documents and Data...

September 10, 2010

Last month I was on Dr. Kiki's Science Hour. Besides being a lot of fun (despite my technical problems, which were part of my recent move to GNU/Linux and away from Mac!), I also discovered that at least one person I went to high school with is a fan of Dr. Kiki, because he told everyone about the…

Marking and Tagging the Public Domain

August 11, 2010

I am cribbing significant amounts of this post from a Creative Commons blogpost about tagging the public domain. Attribution is to Diane Peters for the stuff I've incorporated :-) The big news is that, 18 months since we launched CC0 1.0, our public domain waiver that allows rights holders to place…

rdf:about="Shakespeare"

July 11, 2010

Dorothea has written a typically good post challenging the role of RDF in the linked data web, and in particular, its necessity as a common data format. I was struck by how many of her analyses were spot on, though my conclusions are different from hers. But she nails it when she says: First, HTML…

Of Pepsi and ScienceBlogs...

July 7, 2010

I've gotten a few emails about the Pepsi-ScienceBlogs tempest. It's clearly taken a toll on ScienceBlogs' credibility. Some of my SciBlings have resigned in protest, and others are taking shots on the topic. Sponsorship is part of scientific publishing, even in the peer reviewed world. Remember how…