The Book of Trogool

Dividing up the pie

Another thing I meant to call out in the context of the Jupiter-goes-boom event was the nod to data gathered by people who aren’t connected to the formal research enterprise save tangentially.

This event was first noted by someone not an astronomer by profession, and the article notes that this is hardly the first time astronomers have been scooped. My husband, who is an extremely amateur skygazer and likes to hang out on online astronomy bulletin boards, says that his impression is that astronomers mingle with enthusiasts fairly freely, all things considered, and both sides appear to benefit.

Astronomy isn’t the only field where this happens, of course. The Center for History and New Media projects I mentioned in my previous post are essentially crowdsourced news-gathering turned into history. When I was a graduate student in linguistics back in the day, I had occasion to look at Mayan, which amateurs have been instrumental in deciphering. Birdwatchers no more skilled than I are of material help to ornithologists in providing localized bird counts and similar observations. I am also seeing some renewed excitement about “crowdsourcing” various scientific tasks that can’t be done by computers but are too laborious and time-consuming to assign to researchers.

So my question about all this is? who’s looking after their data? Do data have to come from an accredited scientist affiliated with an institution before they are worth preserving?

Sometimes these questions have answers. Sometimes, not so much.

This points to a larger question, an elephant-in-the-room question. Whose responsibility is all this data gathering and preservation, anyway? “Individual researchers” is an inadequate cop-out, let’s just get that on the table right now; without sustainable support, data die when grants fade or retirements happen.

This leaves a few possibilities: funders (notably government), disciplines, and institutions. None of them is unproblematic?in fact, I would go so far as to say that none of them can solve this problem unaided.

Relying on funders assumes that funders will take a long-term perspective on sustainability. Funders can be fickle about this, even government funders; witness the troubled trajectories of the ERIC education database in the US and the Arts and Humanities Data Service in the UK. Worse, outside government vanishingly few funders have resources and infrastructure to throw at this problem; the most they can do is throw money at it in the form of grants, which is not a sustainable funding model by any means.

The line between disciplines and institutions is often a fuzzy one, honestly. The arXiv is the paradigmatic disciplinary preprint repository?but it is sustained by the Cornell University Libraries. Things were not always thus, but such a handoff isn’t exactly unusual.

However. When you ask a researcher about her “discipline,” she’ll probably start talking about her favorite scholarly society. Where are the scholarly societies in all this ferment about data? Gosh, wish I knew. We’ll just pass by the American Chemical Society in silence, shall we? They’re an outlier and we should all be glad of that? but where’s everybody else? Looking for services that members need? Materials that keep members coming back to the society? Why aren’t scholarly societies in the data business? I wonder.

Institutions. Institutions have a built-in challenge dealing with data: they have to deal with it over a wide swathe of disciplines. I can’t emphasize enough how hard that is! Different formats, different metadata standards (where there are any at all), different ontologies, different patterns of thought, different workflows? there’s just no end to the differences.

In these early days, I see a few different institutional approaches to this problem. One is “follow the money.” If you’ve got million-dollar grants, you’ll get red-carpet treatment. No grants? No service. When this model is accused of inequity, it throws its hands up and says “since when was life fair?” Another approach is what I call “help the First Son.” In the Pesach parable, the first son is the one who approaches his father asking detailed and intelligent questions about Pesach observance, and receives detailed and intelligent answers.

I don’t know about you, but I don’t know many First Sons among researchers. A few, yes, but not many. A lot of the researchers I know are Third Sons. “What is this?” they say. And a lot are Fourth Sons, who do not even know how to ask. A First-Son approach leaves our Third and Fourth Sons with no answers.

So what we’re left with, when we ask who’s responsible for data, is a big muddle. Some disciplines have this pretty much sorted. For them, institutional support may be redundant. Other disciplines are under the funder gun; it’s still unclear what the institutional role will be there. Many researchers fall into neither group; either their institution helps them or they get no help.

My worry is that as the pie is currently divided, a lot of researchers aren’t getting any.