One of the problems practically every nascent data-curation effort will have to deal with is what serials librarians call the backfile, though the rest of us use the blunter word backlog.
There’s a lot of digital data (let’s not even think about the analog for now) from old projects hanging around institutions. My institution. Your institution. Any institution. There may be wonderful data in there, but chances are they’re in terrible condition: disorganized, poorly described if described at all, on perishable (and very possibly perished) physical media. This pile of mostly-undifferentiated stuff is what all the digital-heat-death-of-the-universe people are on about.
What to do about it? Make no mistake, it takes considerably more human ingenuity and effort to rescue data than to treat it right at the outset. If a small data-curation team just out of the starting gate tries seriously to come to grips with the backlog problem, it will almost certainly swamp itself, to the point that it won’t be able to get in on the ground floor of new data-generating projects?which of course only perpetuates the problem.
I hate to say this, but? I believe we’ll have to leave a lot of those data lie. We can use some of the backlog to learn on; I would be inclined to start with data relating to a revered institutional priority such as theses and dissertations. We can possibly also pick up a few horses in midstream, researcher workflow permitting.
Grant agencies should look seriously at data-rescue projects, in my opinion. Grant funding is lousy for sustainability, but for rescue projects where the main effort is a one-time licking into shape and the sustainability is a given, grant funding makes a lot of sense. There’s certainly no lack of data to rescue!
Still, I strongly believe that the principal priority of a new data-curation team should be new data, new workflows, and new research projects. Perpetually playing catch-up is not a good space to be in. Also, faculty aren’t nearly as engaged with their old projects as their current ones, so for good word of mouth and campus visibility, working with current projects is the way to go.
Thanks to Chris Rusbridge for making me think about this. The answer I arrived at wasn’t the one I expected to.
A short reminder: I’m at Access 2009 the rest of this week. Blogging is liable to be nonexistent.