Data longa, tractatus brevis

Dan Cohen has an extraordinarily worthwhile post recounting his talk at the Shape of Things to Come conference at Virginia (which I kept my eye on via Twitter; it looked like a good 'un).

I see no point in rehashing his post; Dan knows whereof he speaks and expresses himself with a lucidity I can't match. I did want to pick up on one piece toward the end, because it has implications for library and archival systems design:

Christine Madsen has made this weekend the important point that the separation of interface and data makes sustainability models easier to imagine (and suggests a new role for libraries). If art is long and life is short, data is longish and user interfaces are fleeting. Just look at how many digital humanities projects that rely on Flash are about to become useless on millions of iPads.

As I've had occasion to mention, scholars generally and humanists in particular have a terrible habit of chasing the shiny. If Dan's post helps lead to an ethic of "sustainable first, shiny later," I will be a very, very happy camper. (I note that Dan's shop has firsthand experience with losing older projects to the shiny—non-standardized Javascript, if I recall correctly. Dan speaks from a position of hard-earned wisdom!)

The answer to this conundrum is not, however, "avoid the shiny at all costs!" It can't be. That will only turn scholars away from archiving and archivists. To my mind, this means that our systems have to take in the data and make it as easy as possible for scholars to build shiny on top of it. When the shiny tarnishes, as it inevitably will, the data will still be there, for someone else to build something perhaps even shinier.

Mark me well, incidentally: it is unreasonable and unsustainable to expect data archivists to build a whole lot of project-specific shiny stuff. You don't want your data archivists spending their precious development cycles doing that! You want your archivists bothering about machine replacement cycles, geographically-dispersed backups, standards, metadata, access rights, file formats, auditing and repair, and all that good work.

So this implies a fairly sharp separation between the data-management applications under the control of the data archivists, and the shiny userspace applications under the control of the scholars. How many of our systems have, or even imply, such separation?

DSpace doesn't, to my everlasting annoyance. (Try building a userspace application on top of materials in DSpace but wholly outside it. Just try.) Omeka doesn't—sorry, Dan. Not Greenstone, not EPrints, not ContentDM, not any of the EAD systems out there, not DLXS. All of these are built as silos, their APIs somewhat to appallingly limited. I'm here to say, the data silo needs to die, and the sooner the better.

Fedora Commons has this right. I say again: for all its faults, and it has them, Fedora Commons has this piece right. I also like what I see coming out of places like the Library of Congress, the California Digital Library, and the University of North Texas.

But let's stick with Fedora, because it's what I know best. Fedora isn't even trying to be the whole silo; it punts on the userspace problem entirely. It doesn't have a web user interface that anyone other than a command-line addict would recognize. What it has is a reasonably comprehensive (and improving) API on which any number of interfaces can be built.

Since "any number" is the exact number of interfaces that will need to be built (and coexist) over wildly varying data… you see why I think this the right approach. If you want to see this approach in action, you need seek no further than Islandora and its Virtual Research Environments.

Here's the fun bit: it doesn't take the University of Prince Edward Island's developers to create a new VRE. Any Drupal dev willing to learn about Fedora's view of the universe and reverse-engineer some of UPEI's code can do it. That's a fair few devs.

And that's the way the world will have to be. Data longa, tractatus brevis.

Tags

More like this