The Book of Trogool

On NSF data plans

Word on the street is that the NSF is planning to ask all grant applicants to submit data-management plans, possibly (though not certainly) starting this fall.

Fellow SciBlings the Reveres believe this heralds a new era of open data. I’m not so sanguine, at least not yet. Open data may be the eventual goal; I certainly hope it is. At this juncture, though, the NSF would be stupid to issue a blanket demand for it, and I rather suspect the NSF is not stupid.

Part of the problem, of course, is that many disciplinary cultures are simply not ready for even the idea of open data. If the NSF were to mandate it, these disciplines would revolt openly, tossing lots of “government interference in science” rhetoric around. Moreover, disciplines that are hand-in-glove with industry would lead the charge, with industry’s big bucks to back them up. I hear quite a lot about industry strongarming academic scientists into considering nearly everything, emphatically including data, a “trade secret.”

(Lest anyone think this type of reaction is limited to the sciences, I ask you to recall the kerfuffle at Iowa over electronic theses, spearheaded by the creative-writing department.)

Another part of the problem is that many, perhaps most, scientists who are ready for the idea of open data are emphatically unprepared for its praxis. It’s beyond doubt that data management will be extra work for most of these people, given how sloppy and ad-hoc many data practices are; as the NIH Public Access Policy demonstrates, adding to a researcher’s workload must be done with extreme circumspection.

The NSF can’t hand down guidelines from on high. Blanket “here is how you deal with your data” demands will not work, given the quantity, variability, and variable sensitivity of data across the scientific enterprise. Data standards? Data standards don’t exist for the entirety of science (never mind metadata standards), and not even the NSF can wave a magic wand to call them into existence. Rather cleverly, then, the NSF is planning to say “We don’t necessarily know how to deal with your data, but we expect you to think about it and do the right thing.”

So if you think you might be affected by this rule if it comes to pass, what should you do? Here’s what I think.

  • Do not try to revamp every single process and procedure you have. Do not try to “rescue” all your old data all at once. You will swamp yourself and get discouraged. Seriously, don’t. Panic won’t help you here.
  • Instead, look back at your last funded project, since it will be freshest in your mind. What data did it produce?
  • What happened to that dataset in the course of your research? Did you run programs against it? Be prepared to archive and document that code.
  • Who handled your data? Did they document it? Where? If there is any part of the process you’re fuzzy on, be aware that this fuzziness will need to go away for your next project.
  • Ask yourself the famous ten questions (PDF) about your data. The answers will inform your data-management plan.
  • What can’t you do for your data that you think should be done? Need partners? Go find them now. Depending on your needs, the right partners may be in your campus library or IT organization, or they may exist at your funder or in a research center near you.

That should keep you out of trouble for a while! It will also mean that you are prepared come the next funding cycle, where many would-be grantees won’t be. In today’s cutthroat funding environment, that can only help.