Praxis
One of the problems practically every nascent data-curation effort will have to deal with is what serials librarians call the backfile, though the rest of us use the blunter word backlog.
There's a lot of digital data (let's not even think about the analog for now) from old projects hanging around institutions. My institution. Your institution. Any institution. There may be wonderful data in there, but chances are they're in terrible condition: disorganized, poorly described if described at all, on perishable (and very possibly perished) physical media. This pile of mostly-undifferentiated…
Sometimes it's worthwhile to let my "toblog" folder on del.icio.us marinate a bit. Posts I recently ran across on two different blogs illuminate the same point so well that they deserve their own post here!
Off the Map offers Huffman's Three Principles for Data Sharing, which are really principles for data-collection and -display applications:
Create immediate value for anyone contributing data.
Make contributors' data available back to them with improvements. (emphasis mine)
[Urge users to] share derivative works back with the data-sharing community.
Absolutely. These three principles boil…
When I was but grasshopper-knee tall, my father the anthropologist took me to his university's library to help him locate and photocopy articles in his area of study for his files. He had two or three file cabinets full of such copies. (He may still.)
I have similar file cabinets, two of them: my del.icio.us account and my Zotero library. The del.icio.us account consists merely of links. The Zotero library, on the other hand, includes the actual digital object(s) as often as I can manage it (even at a major research university like MPOW, I cannot always lay eyes on everything I want to read…
Just a quickie post today—
In answer to my post about intertwingularity, commenter Andy Arenson suggested that the way to rescue an Excel spreadsheet whose functions or other behaviors depended on a particular version of Excel was to keep that specific version of Excel runnable indefinitely.
This is called "emulation," and it assuredly has its place in the digital-preservation pantheon. Some digital cultural artifacts are practically all behavior—games, for instance—and just hanging onto the source code honestly doesn't do very much good. The artifact is what happens when that code is run,…
A common problem adduced in e-research (not just e-research, but it does come up quite a bit here) is expertise location, both local and global.
You need a statistician. Or (ahem) a metadata or digital-preservation expert. Or a researcher in an allied area. Or a researcher in a completely different area. Or a copyright expert (you poor thing). Very possibly the person you want works right down the hall, or in the building next door, or in the library, or somewhere on campus. But how on earth do you know?
You could call around to the offices or departments most likely to contain the expertise…
When I was but a young digital preservationist, I was presented with an archival problem I couldn't solve.
This should not sound unusual. It happens a lot, for all sorts of reasons. If I can keep a few people from falling into traps that make digital preservationists throw up their hands in despair, I'm happy.
Anyway, the problem was a website with some interactions coded in Javascript. If those interactions didn't work, the site made significantly less sense. (It could have been worse; even without the Javascript, the materials on the site were still reachable.)
The Javascript had been coded…
Many people, first confronted with the idea of data curation, think it's a storage problem. A commonly-expressed notion is "give them enough disk and they'll be fine." Terabyte drives are cheap. Put one on the desk of every researcher, network it, and the problem evaporates, right?
Right?
Let me just ask a few questions about this approach.
What happens when a drive on somebody's desk fails?
What do we do about the astronomers, physicists, and climatologists, who can eat a whole terabyte before breakfast and hardly notice?
What do we do about the social scientists, medical researchers, and…
Five years ago (really? goodness, it hardly seems possible) I gave a preconference session at the Extreme Markup Languages conference (which is now Balisage) entitled "Classification, Cataloguing, and Categorization Systems: Past, Present, and Future."
I have learned to write better talk titles since then. However. The talk was actually a runthrough of library standards and practices for an audience of markup wonks. Like any field, librarianship has its share of jargon and history that legitimately seems impenetrable to outsiders.
I'm going to try to reprise some of that talk here in blog…
I see a lot of metadata out there in the wild woolly world of repositories. Seriously, a lot. Thesis metadata, article metadata, learning-object metadata, image metadata, metadata about research data, lots of metadata.
And a lot of it is horrible. I'm sorry, it just is—and amateur metadata is, on the whole, worse than most. I clean up the metadata I have cleaning rights to as best I am able, but I am one person and the metadata ocean is frighteningly huge even in my tiny corner of the metadata universe.
So here's a bit of advice that would save me a lot of frustration and effort, and is…
FriendFeed, now due to be absorbed into the Borg the Facebook empire, allowed me to lurk on the fringes of the scientific community Cameron Neylon mentions in his post on the takeover.
Insert all the usual clichés here: it was enormously valuable, I learned a lot, and I wouldn't have missed it for the world. My humanities training wouldn't normally gain me entrée into such a circle, and neither would my professional identity. Insofar as I have professional ambitions in scientific data management, every bit of acculturation I can come by is priceless.
That community wasn't the only one I…
I hear talk about "the cloud" as the solution to research data curation. Data will waft softly up into "the cloud," and "the cloud" will look after it and give it back on demand, and there will be unicorns and rainbows and rainbow-colored unicorns, and—well, you get the idea.
I think this is bosh. Balderdash. Bunkum. But I also think it's worth unpacking why this is a popular and recurring idea, because there's the germ of a service design in there.
"The cloud" means a lot of things to a lot of people, but for the sake of argument, let's call it "third-party data-storage services" such as…
Monado of Science Notes commented on my irreplaceable-data post thusly:
It sounds as if the best thing to do in the short term is not throw away the old equipment. And to use the old equipment to copy digital media to newer forms... for which no one ever gets a budget, right?
It's such a great comment that I want to unpack it a bit. As we work out our data praxis, this kind of question is exactly what we have to confront.
My first question is simple: What equipment are we talking about here? Using what media?
Libraries are wearily familiar with this question in (mostly) analog terms. We have…
Yesterday the city of Louisville suffered a freak thunderstorm that dumped half a foot of rain in an hour and a quarter. Their library has been devastated, to the tune of a million-plus dollars in damage.
As a proud member of The Library Society of the World (and I have the Cod of Ethics to prove it!), I ask anyone who is able to throw a few bucks their way. I trust Steve Lawson to do as he says he'll do.
The library's data center and systems office were on its ground floor. If you watch Greg Schwartz's Twitterstream you can keep up with the recovery efforts. For my purposes, though, I want…