Bora/ coturnix over at Science and Politics has generated a lot of conversation via his taxonomy of science blog posts, mostly relating to the call for people to start publishing data and hypotheses on blogs. Much of the discussion that I’ve seen centers on the question of “scooping” (see, for example, here and here), but there’s a wide range of reaction linked from the end of the original post.
Bora seems to regard it as a Bad Thing that people don’t post data (though I should note that I did post some data during the Week in the Lab– calibration data only, granted, but it’s data…). I don’t really share that opinion– I think it’s just a Thing, and as with most such issues, the difference of opinion is the result of some unexamined assumptions.
I keep putting off a response to this, thinking that I’ll eventually have time to do it justice, but I’m just going to have to accept that I won’t have time to be brilliant, and post a half-assed version below the fold:
The first and most important unexamined assumption in this has to do with the nature of science. That is, Bora and other advocates of posting data and hypotheses are assuming that this would be a useful thing to do. That’s not clear to me.
In physics, this sort of model is essentially what goes on in the high energy theory community, only via the ArXiV rather than on blogs. There’s a strong tendency to conduct scientific business primarily by openly distributed pre-prints, rather than via traditional peer-reviewed journals. Many thing sposted to the “pre-print” server never see print in the conventional sense, and serve to generate further discussion and more “pre-print” articles that are essentially more careful blog posts.
(I think it was Jacques Distler who apologised for not doing more science posts by saying that if he was going to put in enough effort to make a good blog post on a new topic, he might as well just put in a little extra work, and post it to the ArXiV….)
That’s a reasonable model for high-energy theoretical physics, where work tends to be done either individually or in small groups. If you’re the sole author, there’s no issue in posting something quickly– you’re the only one who suffers if it turns out to be crap– and as long as the group of authors is relatively small, you can get approval from all of them relatively easily. These papers also have the great advantage of not needing to be checked against anything other than pure math– if you throw hypotheses out there, anybody with the right math background can look it over and see if it makes sense.
This is an absolutely horrible model for a lot of experimental physics, though. For one thing, lots of papers are written by huge collaborations, and the logistical issues involved in getting the approval of all the people who would need to approve posting data are probably an insurmountable obstacle to posting real data (as opposed to occasional locally-generated graphs and whatnot) in anything resembling a blog-like manner.
The bigger problem, though, has to do with cross-checks and replication. Even the sort of small-scale experimentla physics I do is extremely expensive, and requires a lot of apparatus. If I were to post real data, lots of people could check it for math errors, but nobody else could check it for experimental errors, because even the people with the right apparatus in-house would need a good deal of time to set up the experiment.
And the plain fact is that much of what we get as preliminary data and hypotheses is just wrong, usually for technical experimental reasons. Those mistakes do get caught (most of the time), but it takes time, and careful work, and repeated measurements and re-analyses of the data. That’s not something that is sped up in any way by an “open source” model.
In fact, any experimental scientist can tell you stories about problems caused by the too-quick release of preliminary data. I personally have sent at least half a dozen theorists down blind alleys by telling them things about the progress of various experiments that turned out not to be true after some additional checking. (I also had to reject a paper from a journal once that came about entirely because of a non-standard sign definition in one of my articles.) And I’m doing simple proof-of-principle stuff– people doing real precision measurements could start four kinds of panic by posting some of what they get as preliminary data.
Finally, there’s a hidden assumption that people have time for this sort of thing, which is not how experimental physics works in my experience. On most experiments, there are long stretches of no data at all, during which you could freely blog just about anything, and then there are stretches of data overload, during which you’re taking and analyzing preliminary data as fast as you possibly can, which leaves no free time for blogging about anything, let alone blogging about the implications of your preliminary data.
Realistically, I’m not posting data to this blog because I don’t have data to post at the moment. But even when I do have stuff to post, I’m unlikely to make it a big component of the blog, just because the nature of the way my field operates means it’s not a particularly useful thing to do. It has nothing to do with fear of being scooped (hah!), it’s just that there’s no value added for the effort of posting data here.
There’s another assumption here, about the purpose of science blogging, but I’ve babbled enough for now, and really need to get some work done today. (Specifically, I need to go shower– I’m writing this at 8am, but will schedule posting for about 1:30 this afternoon, when I’ll be grading labs…)