The latest issue of the International Journal of Digital Curation is out; if you’re in this space and not at least watching the RSS feed for this journal, you should be.
I was scanning this article on Georgia Tech’s libraries’ development of a data-curation program when I ran across a real jaw-dropper:
One of the bioscientists asked the data storage firm used by one of the labs recently about the costs associated with accessing data from studies conducted a few years ago. The company replied, “you wouldn’t want to pay us to do that. It would be less expensive to re-run your experiments.” (p. 88)
Ouch. The immediate question springing to mind is “why is this lab paying these people to store data if the stored data then become unretrievable by the original depositors?” Roach motel: data goes in, but it doesn’t come out!
It seems to me that the lesson here is making even seemingly-obvious requirements explicit when expensive service provision is in play. You would think that retrieval is an automatic concomitant of storage. I sure would. Apparently not!
I’ve run into similar problems before, but in the best story I have on the subject, the problem was format-related. I retell the story in order to warn people to be wary of hotshot black-box content-management systems.
I once worked for a scholarly-publishing service bureau. The company did editorial work, typesetting, art, design, and SGML/XML-based workflows, which is the division I was in. So one of our SGML clients was shopping for a content-management system to manage and archive all their publishing material. Sensible enough. They specifically asked each vendor whether they could retrieve the same SGML from this system that they’d put into it.
The vendor they eventually contracted with assured them that they could. Not to put too fine a point on it, the vendor was telling untruths. The SGML was munged on ingest into whatever unholy lossy proprietary mess the CMS used natively, and could not be retrieved intact therefrom. Our client didn’t find this out until after the purchase, of course. There was talk of lawsuits; I don’t recall where that went.
Slight happy ending: our shop had its own project-archiving procedures, so the client didn’t lose any SGML that we had provided them with.
Don’t let any of this happen to you! Ask questions that seem stupid, and make your counterpart commit to the answers you want.