Can You Have Open Science in the Dark?

By pontiff on July 8, 2009.

The arXiv is a game changer for how large portions of physics (and increasingly other fields) are done. Paul Ginsparg won a MacArthur award for his vision and stewardship of the arXiv (something other institutions might want to note when they decide that someone trying to change how science is done isn't really doing work that will impact them.) So...Given: The arXiv is great. But there is something that's always bothered me a bit about the arXiv: transparency.

(Note: those of you who wish to complain about the fact that you can't get endorsed on the arXiv, this article is not for you. Here is a place where that discussion will probably flourish)

Now probably I'm sticking my foot where it is most likely to just get my toes broken, because I must admit, I really don't know how the arXiv is run! I have, at least, looked a little bit, but really I've found out very little. I do know that there is an advisory board (including a few "quantum types" and a guy who likes to misspell "qubits" :) ) And apparently there are advisory boards for different major meta-categories. But why these people are on these boards, and what they actually do is completely opaque to me. What does the arXiv advisory board do? And why are these people the advisers? From my vantage point the arXiv appears to be a strict oligarchy. Of course this might be the way things should be run, but I find it a bit jarring that a shining example of open access is itself, apparently, closed.

Running the arXiv must be a hard task, and I have nothing but words of high praise to describe the staff who must be behind the scenes keeping the gears running. But today, for example, the arXiv was unavailable for more than a few hours. Do any of us know what happened? No. (Update: here is news. Foot meet mouth.) Is it likely that we'll ever find out? I wonder. Of course you can ask: why does this matter? In the case of today's outage, it probably doesn't matter much. But do any of us really know, for example, that proper redundancy has been established to keep the arXiv dataset safe from the problems that invariably creep up with that much data? Okay, some of us must know, because some of us are among the oligarchs :)

A further point along these lines: increasingly the information that is in the arXiv is being used in manners that go beyond just the archiving of preprints. As we move to a science where online tools are more important, where open access is a legislated requirement, and where "the data" in computer readable form is a major component of how science progresses, it seems that we, as scientists, should have some say in how the arXiv adapts to these coming changes. Reliability is an issue for those of us who have tried to use the arxiv in interesting or crazy ways. Of course it may be that the way things are working now is fine (I have few complaints), but can we be sure that this will continue?

As you can see this post is simply full of questions. That's because I really don't know the answer to these questions. But I do think this is a discussion that hasn't really been had, at least that I know about. (Of course, "foot meet mouth" sounds a lot like "quantum pontiff" in dufuseaze.) So: is the arXiv too opaque, or just the right shade of transparency, like, you know, the kind you see through a nice cold glass of beer.

More like this

Is it possible to know how many people downloaded a particular paper and where they are from?

Open-ness is not free. It takes a commitment and costs something. Look at your own endeavors, and think how much harder it would be to make scirate and Arxivview fully open to the community. Or, all sorts of other things. Is QIP open? Some of it is, but more goes on behind the scenes than is talked about in the one-hour annual business meeting. Of course the arxiv is more important than any one conference, and even, dare I say it, more important than arxivview. There is also often a huge benefit to opening things up further, and "science 2.0" is going to have to move much more in that direction to have an impact. The science has to be open, but so do the tools behind the science.

> Do any of us know what happened?

from http://arxiv.org/new/ :

"No new papers or announcements for 9 Jul 2009. arXiv administrators spent much of 8 Jul recovering from a problem with the main server (a corrupted database caused by human error on the evening of 7 Jul). During this time users were redirected to mirrors and submissions were not possible. We have thus deferred announcements of new submissions according the to following schedule:

* Articles received from 16:00 EDT Tue 7 Jul 2009 through 16:00 EDT Thu 9 Jul 2009 will be announced at 20:00 EDT Thu 9 Jul 2009."

Geordie,

CiteBase attempts to track exactly that but self-admittedly gives noisy results. Personally, I think it might be nice if someone gave some money to the CiteBase folks to take it beyond this seemingly never-ending beta phase.

Ian and Geordie,

To the best on my knowledge citebase only uses statistics from the UK mirror, so tends to miss out the vast majority of downloads.

Also, if you send email to the arxiv administration you will not get a reply from an identifiable person. You will get emails from 'arxiv administration'. If you think that someone in 'arxiv administration' has done something wrong, there is no clear way to appeal against it or get a second opinion.

Is it possible to know how many people downloaded a particular paper and where they are from?

I sincerely hope they never provide this information. There's little need for it other than vanity, because it cannot have any useful applications. If statistics for the main servers were easily available, I'm sure people would (mis)use it for evaluation, for example to check out how widely read someone's papers are when reading grant applications or tenure cases. As soon as it is used for evaluation, the data will be manipulated, for example by people encouraging their friends to be sure to download their papers even if they have no intention of reading them. The net effect is that the data will not be reliable for evaluation purposes, but some people will be clueless enough to use it anyway. Then even honest people will feel pressure to manipulate the statistics themselves, to keep from falling behind the others who do.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

So Long and Thanks For All the Fish!

July 7, 2010

This blog has moved. The new location is http://dabacon.org/pontiff. So long and thanks for all the fish! Over the past three years I've had a good time blogging here at Scienceblogs. Though I rarely agree with much they say (haha, classic curmudgeon that I am) I can honestly say my fellow…

Dead Spins And The Dirty Ground

June 23, 2010

Yep, it's that time again. Paper dance time! arXiv:1006.4388 Making Classical Ground State Spin Computing Fault-Tolerant Isaac J. Crosson, Dave Bacon, Kenneth R. Brown We examine a model of classical deterministic computing in which the ground state of the classical system is a spatial history of…

Bacon Camp

June 23, 2010

Oh, damn, I missed Bacon Camp. Well I'll just have to go camping myself :) Also: Colored Bacon and Bacon cupcakes.

Best Paper at STOC

June 17, 2010

Congrats to Rahul Jain, Zhengfeng Ji, Sarvagya Upadhyay, and John Watrous for being selected a best paper at STOC 2010 for their paper "QIP=PSPACE". (The best paper award was shared with "An improved LP-based approximation for Steiner Tree" by Jaroslaw Byrka, Fabrizio Grandoni , Thomas Rothvoss…

Pr(Future Dave Bacons|Library Cuts) is Small

June 14, 2010

I grew up in the small town of Yreka, CA ("Yreka Bakery" backwards is...) that sits just minutes south of the Oregon-California border on Interstate 5. Yreka, population a little over 7000 brave souls, is the county seat of Siskiyou county. Siskiyou county is "god's country" meaning, yes, (a) it…