Science Blogging or Blogging Science?

Bora/ coturnix over at Science and Politics has generated a lot of conversation via his taxonomy of science blog posts, mostly relating to the call for people to start publishing data and hypotheses on blogs. Much of the discussion that I've seen centers on the question of "scooping" (see, for example, here and here), but there's a wide range of reaction linked from the end of the original post.

Bora seems to regard it as a Bad Thing that people don't post data (though I should note that I did post some data during the Week in the Lab-- calibration data only, granted, but it's data...). I don't really share that opinion-- I think it's just a Thing, and as with most such issues, the difference of opinion is the result of some unexamined assumptions.

I keep putting off a response to this, thinking that I'll eventually have time to do it justice, but I'm just going to have to accept that I won't have time to be brilliant, and post a half-assed version below the fold:

The first and most important unexamined assumption in this has to do with the nature of science. That is, Bora and other advocates of posting data and hypotheses are assuming that this would be a useful thing to do. That's not clear to me.

In physics, this sort of model is essentially what goes on in the high energy theory community, only via the ArXiV rather than on blogs. There's a strong tendency to conduct scientific business primarily by openly distributed pre-prints, rather than via traditional peer-reviewed journals. Many thing sposted to the "pre-print" server never see print in the conventional sense, and serve to generate further discussion and more "pre-print" articles that are essentially more careful blog posts.

(I think it was Jacques Distler who apologised for not doing more science posts by saying that if he was going to put in enough effort to make a good blog post on a new topic, he might as well just put in a little extra work, and post it to the ArXiV....)

That's a reasonable model for high-energy theoretical physics, where work tends to be done either individually or in small groups. If you're the sole author, there's no issue in posting something quickly-- you're the only one who suffers if it turns out to be crap-- and as long as the group of authors is relatively small, you can get approval from all of them relatively easily. These papers also have the great advantage of not needing to be checked against anything other than pure math-- if you throw hypotheses out there, anybody with the right math background can look it over and see if it makes sense.

This is an absolutely horrible model for a lot of experimental physics, though. For one thing, lots of papers are written by huge collaborations, and the logistical issues involved in getting the approval of all the people who would need to approve posting data are probably an insurmountable obstacle to posting real data (as opposed to occasional locally-generated graphs and whatnot) in anything resembling a blog-like manner.

The bigger problem, though, has to do with cross-checks and replication. Even the sort of small-scale experimentla physics I do is extremely expensive, and requires a lot of apparatus. If I were to post real data, lots of people could check it for math errors, but nobody else could check it for experimental errors, because even the people with the right apparatus in-house would need a good deal of time to set up the experiment.

And the plain fact is that much of what we get as preliminary data and hypotheses is just wrong, usually for technical experimental reasons. Those mistakes do get caught (most of the time), but it takes time, and careful work, and repeated measurements and re-analyses of the data. That's not something that is sped up in any way by an "open source" model.

In fact, any experimental scientist can tell you stories about problems caused by the too-quick release of preliminary data. I personally have sent at least half a dozen theorists down blind alleys by telling them things about the progress of various experiments that turned out not to be true after some additional checking. (I also had to reject a paper from a journal once that came about entirely because of a non-standard sign definition in one of my articles.) And I'm doing simple proof-of-principle stuff-- people doing real precision measurements could start four kinds of panic by posting some of what they get as preliminary data.

Finally, there's a hidden assumption that people have time for this sort of thing, which is not how experimental physics works in my experience. On most experiments, there are long stretches of no data at all, during which you could freely blog just about anything, and then there are stretches of data overload, during which you're taking and analyzing preliminary data as fast as you possibly can, which leaves no free time for blogging about anything, let alone blogging about the implications of your preliminary data.

Realistically, I'm not posting data to this blog because I don't have data to post at the moment. But even when I do have stuff to post, I'm unlikely to make it a big component of the blog, just because the nature of the way my field operates means it's not a particularly useful thing to do. It has nothing to do with fear of being scooped (hah!), it's just that there's no value added for the effort of posting data here.

There's another assumption here, about the purpose of science blogging, but I've babbled enough for now, and really need to get some work done today. (Specifically, I need to go shower-- I'm writing this at 8am, but will schedule posting for about 1:30 this afternoon, when I'll be grading labs...)

Tags

More like this

There's a kerfuffle in the physics blogosphere these days over the somewhat arcane issue of TrackBacks to posts on the ArXiV, the commual preprint server where researchers can post drafts of the papers that they have submitted to research journals (or, if they're working in high energy physics,…
In which we look again at the question of why, despite the image of physicists as arrogant bastards, biologists turn out to be much less collegial than physicists. ------------ While I was away from the blog, there was a spate of discussion of science outreach and demands on faculty time, my…
There's a minor kerfuffle at the moment over the XENON experiment's early data (arxiv paper) which did not detect any dark matter in 11 days of data acquisition. This conflicts with earlier claims by the DAMA experiment and recent maybe-kinda-sorta detections by the CoGeNT and CDMA experiments. As…
I ended up feeling that my most valuable contribution to the Science Online meeting (other than boosting the income of the Marriott's bartenders) was providing experienced commentary and advice from a slightly different angle than a lot of the other participants. A bunch of this got tweeted out by…

I agree, blogs simply aren't the right medium to publish new results. That's what the arXiv is for (which is increasingly used not only by high-energy physicists, but also by people from other disciplines). One reason that everyone should be able to understand is that it would be incredibly tedious having to surf to lots and lots of different blogs to look for and at new results (and how would you know which ones to look at? you might miss out on important new developments just by not reading the right blogs); the arXiv provides a single location for that, saving a lot of time and effort.

Where blogs are useful in science is in providing a board on which new results (e.g. from the arXiv) can be discussed by interested parties, in offering a cheap means of publicity both for science and for individual scientists, in popularizing science to the internet crowd, and even as a forum for online "lectures" and "seminars".

Yep, that last word kind of summarizes it: blogs are somewhat like seminar series, only in writing; they are not a replacement for preprints and publications (the quality standards for the latter ought to be significantly higher than for blog posts).

For what it's worth, astrophysics works exactly the same way with regard to arXiv, likely because the working arrangements are just as you describe. By the time you get done explaining how to make sense of data, you've basically written the first 2/3 of a paper.

Rather than data, blogs might be more useful for distributing computer codes, whether they be links to large numerical libraries or just little analysis routines for plotting packages like gnuplot, supermongo, et al.

This is a very interesting perspetive from a different discipline. I think this is going to be a very slow and gradual process with some small and tentative steps at first. There will always be research - probably most of the research in any field - that will only be suitable for regular peer-reviewed journals, though such journals should be expected to mov eonline and become open-source more and more.

Still, I see blogs as one of the parts of the big picture, mainly for publishing negative, unpublishable or smaller-than-LPU work, the kind that Alex has posted this morning, or the kind of stuff I posted on Circadiana. In other words, research that otherwise would not get published at all, yet may be useful for others to see (at least to know what not to do because it does not work).

Also, in some fields, discussing ideas and hypotehses on blogs will be more appropriate than in other fields. In some of the follow-ups on the original post, I link to some discussions (as well as emerging software) for ensuring that blog posts are properly time/date-stamped, saved forever, and properly referenced.

In other words, I do not see blogs replacing traditional publishing, but adding to the conversation, in a way conferences add to the conversation, except you don't need to pay airfare (which is a big deal if you are a scientist in the Third World).

hey, you should stop claiming to post "half-assed" responses, because they turn out to be thorough and interesting responses (or thorough enough for a blog post.)