Just a quickie post today?
In answer to my post about intertwingularity, commenter Andy Arenson suggested that the way to rescue an Excel spreadsheet whose functions or other behaviors depended on a particular version of Excel was to keep that specific version of Excel runnable indefinitely.
This is called “emulation,” and it assuredly has its place in the digital-preservation pantheon. Some digital cultural artifacts are practically all behavior?games, for instance?and just hanging onto the source code honestly doesn’t do very much good. The artifact is what happens when that code is run, which means preserving it means keeping that code runnable, which in turn means preserving its runtime environment as best we can.
No mere bagatelle, this. If you turn up your nose at games (which you really, really shouldn’t), consider the humble Hypercard stack from the 1990s. A good many enterprising artists and designers built rather remarkable things on it, as well as over other bits of the early Macintosh systems environment?and all those things are right this minute in danger of disappearing forever because we can’t emulate that environment sufficiently well to rescue them.
For most data, though, I honestly prefer a “migration” strategy, in which format obsolescence is fought by modifying files to keep them usable in modern hardware and software environments. Hardcore emulationists disagree with me; I’ve seen articles boasting that any environment in the history of computing is trivial to emulate, so why even bother with migration? Frankly, I don’t believe a word of it. If it were that trivial, it would have been done already. It hasn’t.
I prefer migration because emulation feels like putting the data in a museum: look all you want, but don’t touch. Data should be touchable, rearrangeable, mashup-able; a good migration will keep them so. Also, in general migration is much less of a reach for memory organizations than emulation. Take me, for instance. I’m a tolerably talented data migrator. I can’t do anything with emulation.
Migration itself is not always trivial and can be lossy. My friend Tim Donohue developed (and won a conference prize with) a DSpace hack that sends Microsoft Office files through a copy of OpenOffice.org running on the DSpace server, saving ODF versions of the files to DSpace along with the Office versions. Worked like a charm, as far as it went. What was the problem? FONTS. Because the server had a minimal font complement at best, the ODF files came out looking unusably horrible.
Migration is sometimes impossible, if the origin format is proprietary, opaque, or otherwise not reverse-engineerable. Unfortunately, emulation has limited if any success in this situation as well; if the file format is obfuscated, so is the software environment, generally!
Of course, the gold standard is a research workflow that respects data enough to put thought and care into describing it and using future-friendly formats right from the beginning. We don’t live in that world, and we may never live in that world? so the migration-versus-emulation wars are only beginning.