Lions, Tigers, and Crowds

I gave a talk at eTech two weeks ago. It was a busy time - I was in the middle of my wedding, which was in Brazil, and I actually had to leave Brazil and fly to San Jose to give the talk, have a couple of meetings, and fly right back so that I could rejoin the wedding festivities. We were announcing a collaboration with Microsoft (which has garnered its own attention and criticism, and deserves its own blog posting here, which it will get) as well.

I'm also trying out some new themes for talks. I gave over 70 talks last year and although I loved the three core talks I gave much of the year, when you talk that much you want to say something new. It was also one of my first talks to a truly general web 2.0-ish crowd as opposed to my normal audiences of scientists and policy wonks. I felt a pressure to be both accessible and challenging.

Well, one of the neat things about the Web is that you find out how successful you are in your goals. Cameron Neylon is a blogger I follow pretty closely - he's smart *and* funny, a rare combination - and he's posted a deep and thoughtful meditation on what was to me a minor chord progression in my talk. But apparently it resonated and sparked some conversation out in the luminiferous aether.

On Slide 17 I state: "there is no crowd." Cameron has a nice post on what crowds mean, two different kinds of crowdsourcing, and the Polymath project that Michael Nielsen's written about.

I wasn't talking about any of that stuff. I was simply trying to point out to an audience used to a potential market of billions that, comparatively speaking, there just aren't a lot of scientists, and that has to be taken into account in our strategies for creating open science.

The low total numbers of scientists, and the high barriers to becoming a scientist, represent a design challenge for open science. I tend to think the urge to share is distributed across people at different levels. Certainly, some of us want to share more than others. Maybe it's Gaussian, maybe it's not. But if x% of people like to share generally, then it's likely that some y% of scientists like to share, and it's probably within the same order of magnitude.

The difference is that .005% of all web users gets us Wikipedia. .005% of geneticists gets us a table at T.G.I. Friday's. My point was that the math breaks down for crowds and science.

I don't make this point to discourage us from open science. I make it because I believe it creates a series of what Dennett would call "forced moves" for our strategies to achieve open science. And we have some guidance from how we arrived at today's open web and programming and cultural worlds.

First, we have to encode enough knowledge into abstract, reusable forms that science is easier. This gets sniffed at all the time - "science is supposed to be hard" bellyaching, and "i don't want normal people to understand genetics, because it's complicated" and so forth.

In the immortal words of Socrates, f*ck that.

Programming used to be really hard. In the 1970s, it *was* really hard. In the 80s it was easier. In the 90s it was easy enough that I could do it, and enough people could do it together that we got free and open source software. Today, my aunt Fran can do things like add Facebook apps that cover up all sorts of complexity and allow her to do things that used to be the exclusive domain of programmers.

It is coming to the sciences. If you don't believe me, I would encourage you to attend an iGEM conference and have your mind blown by teenagers programming bacteria, or to do an eBay search for gene sequencers, or to order yourself a copy of scratch-built gene sequence from Mr Gene...and that's just biology.

This change has got to happen in science. First, it makes the scientists themselves more powerful (as OOP made programmers more powerful). Then it lets us bring in the smart people who haven't been through the guild training sessions. Then it's going to let in crowds. This change requires an object-oriented approach to knowledge, which is why I think we need ontologies, and which is why I am a convert to the Semantic Web.

The second design constraint is access. If we only have a few people, we have to make damn sure that each of them is as powerful as possible. That's part of point one. But remember that we're talking about knowledge here, and there's a lot of it out there already. Unfortunately it's not object oriented.

Hell, it doesn't even have hyperlinks.

But we have to convert it to the new formats that let us empower scientists, who will in turn empower the emergence of crowds. That means we have to have access to it, and the rights to make the transformative changes to formats. It's unlikely we have got the systems correct right now. Remember there were dozens upon dozens of hypertext systems developed before the right one evolved at the right time. We need to let a lot of experiments happen and let evolutionary selection work its magic. We can't count on one company, one school, one publisher to get it right.

Access and knowledge encoding. They're part of the same continuum, part of what lets us move forward culturally *and* technologically. They turn the lack of a crowd from a problem into a solution, because they let us increase the power of the crowd we have, and over time, dissolve the participation barriers that keep the crowd artificially small.

I repudiate the idea that science is special and hard because of science. I think it's special and hard because we have failed to imagine the world in which, 25 years from now, science is part of our lives just like the web is part of our lives. When I was a kid, the idea that we'd use computers in literally everything was fiction. But a group of people made it happen, by creating a world where the tools of programmers diffused out to a wider world, and by encoding principles of access into networks and documents.

For me, open science is about enabling precisely the same transformation. We have a model. We just have to have the will and the wits to make it work again.

More like this

Your point about hyperlinks is important. I have been involved in the conversation about your Microsoft collaboration and the lack of hyperlinks was quite apparent there. A commenter at my site used an example from the human diseases ontology - I wanted to refer to a particular disease but could not find a web endpoint, only an entire ontology to download in some format I know nothing about. One of the things Microsoft could help us with would be to put all the ontologies used by the plugin online so anyone can reference a node.

See my post asking why a word processor ontology plugin should not use hyperlinks to embed semantics into a document so that everyone can play.