One phenomenon that will be?indeed, already is?utterly unavoidable in the data-curation space is the creation of standards. I once heard Andrew Pace say that standards are like toothbrushes: everybody thinks they’re great, but nobody wants to use anybody else’s.
Be that as it may, standards development and compliance is one way to make everybody’s data play nicely with everybody else’s data. It’s not the only way, to be sure; one very important way that I’m sure we’ll also see more of is Being The Only Game In Town. ICPSR manages this quite successfully, and so does the Digital Sky Survey. If you want to be important in the data spaces dominated by either of these large players, you play by their rules, just that simple.
When there’s no big player to lay down the law, though, standards development becomes more attractive. How do you make a standard, then? More to the point, how do you make a good standard, a standard that works, a usable standard, a standard that will last?
I liked this blog post by Adam Bosworth about standards development very much. I think it captures much of the excellence that goes into successful standards as well as the dysfunction attending failed ones. I do want to add a fillip of my own, though, based on my own experience helping to build standards and trying to use standards built by other people.
When you’re in a roomful of people tasked with building a standard, make sure the room contains representation from every group of people who will be asked or required to use it. That emphatically includes the non-technical and the non-specialist. It goes double or triple if the standard will affect existing technology installations: you must have someone in that standards room who uses the existing technology! No, a developer of the existing technology does not fulfill this requirement, because the distance between developers’ understanding and users’ understanding is often vast.
If the non-technical, non-specialist representative in the room can’t understand the standard, it will fail. If that representative can’t produce data that fit the standard, likewise. I agree with Bosworth’s reservations about RDF; I myself have trouble understanding it and putting it to use, despite a decade’s experience with markup, and I believe the tribulations such folk as I face when trying to deal with it have retarded its adoption significantly.
What happens when this rule about representation is flouted, but standards are published anyway, is standards that fall apart under real-world use. I will adduce OAI-PMH as an example. It follows quite a few of Bosworth’s recommendations: it’s simple (I have explained it in twenty minutes to library-school students), largely human-readable, focused, precise about encodings, in possession of real implementations, and free on the web.
It is also flawed. Huge projects built on it have found its flaws impossible to bypass and expensive to work around (see Lagoze et al. 2006 for how NSDL ran aground on OAI-PMH’s inadequacies).
The major flaw, to my mind, isn’t difficult to explain or to understand: OAI-PMH has no error-reporting built in. In a protocol standard built for communication of and about metadata, nobody in the standards-design process ever seems to have asked the (to me) simple and obvious question, “What happens if the metadata is malformed or otherwise wrong?”
Anyone who’s worked on the ground with repositories of any stripe knows that metadata problems, sometimes gross problems, are par for the course. For that matter, any librarian can explain the pitfalls of metadata and citation creation at great length. I honestly can’t tell you why OAI doesn’t seem to have on-the-ground repository managers and other librarians capable of raising such practical issues working on its standards bodies.
I can, however, tell you that they should. The latest OAI development, OAI-ORE, contains exactly the same no-error-reporting weakness I just pointed out in OAI-PMH. Yes, some of the underlying technologies that OAI-ORE is built on contain certain kinds of error reporting, but the aggregation of those errors that can be reported is only a subset of the errors that I believe will crop up.
To make standards that work, include people on the standard-design team who work with the processes underlying the standard. Now that you know this?go forth and standardize!