Many of my readers will already have seen the Nature special issue on data, data curation, and data sharing. If you haven't, go now and read; it's impossible to overestimate the importance of this issue turning up in such a widely-read venue. I read the opening of "Data sharing: Empty archives" with a certain amount of bemusement, as one who has been running institutional repositories in libraries for four years. I think Bryn Nelson has confusingly conflated different notions of "data" in his discussion of the University of Rochester's IR. By the definition Nelson appears to be thinking about…
Now that we've looked at how back-of-book indexes endeavor to organize and present the information found in a book, we can consider organizing books themselves. It's quite astonishing, how many people go to libraries and bookstores who never seem to stop to think about how books end up on particular shelves in particular areas. There is no magic Book Placement Fairy! Let's consider the problems we're trying to solve for a moment. A library has a lot of books, on which ordinary inventory-control processes must operate. So librarians as well as patrons must be able to locate the specific book…
I see this confusion so often it seems worth addressing. If you scan a page of text, what you have is a picture. A computer sees it not as letters, numbers, and punctuation—but as pixels, bits of light and shade and color, just like the pixels in your favorite family photo on Flickr. You can't search for, extract, highlight, or cut-and-paste such "text." It doesn't matter whether you embed the picture in a PDF; you still can't search it. Ceci n'est pas une texte! Compare this to creating a PDF from a word-processing or page-layout document. The computer already thinks of the text in these…
Happy Labor Day, US readers. Time to clean out the "toblog" tag on del.icio.us again: Everyone else has already linked to this Wall Street Journal article on data curation, so who am I to go against the tide? My chief takeaway is the trenchant observation that judging the value of data is not straightforward. One scientist's noise is another's signal, and everything is grist for the history-of-science mill. My friend from ebook days Gene Golovchinsky is learning by experience some hard truths about migration versus emulation. Welcome to the fold, Gene! Let's all play with supercomputers! Fine…
I found out from a few different sources (thanks, all!) that my post about back-of-book indexes made it into American Libraries Direct yesterday. Welcome to any and all new readers! I hope you stick around. I'm going to tackle classification next…
Just a quickie post today— In answer to my post about intertwingularity, commenter Andy Arenson suggested that the way to rescue an Excel spreadsheet whose functions or other behaviors depended on a particular version of Excel was to keep that specific version of Excel runnable indefinitely. This is called "emulation," and it assuredly has its place in the digital-preservation pantheon. Some digital cultural artifacts are practically all behavior—games, for instance—and just hanging onto the source code honestly doesn't do very much good. The artifact is what happens when that code is run,…
A common problem adduced in e-research (not just e-research, but it does come up quite a bit here) is expertise location, both local and global. You need a statistician. Or (ahem) a metadata or digital-preservation expert. Or a researcher in an allied area. Or a researcher in a completely different area. Or a copyright expert (you poor thing). Very possibly the person you want works right down the hall, or in the building next door, or in the library, or somewhere on campus. But how on earth do you know? You could call around to the offices or departments most likely to contain the expertise…
When I was but a young digital preservationist, I was presented with an archival problem I couldn't solve. This should not sound unusual. It happens a lot, for all sorts of reasons. If I can keep a few people from falling into traps that make digital preservationists throw up their hands in despair, I'm happy. Anyway, the problem was a website with some interactions coded in Javascript. If those interactions didn't work, the site made significantly less sense. (It could have been worse; even without the Javascript, the materials on the site were still reachable.) The Javascript had been coded…
I said awhile ago that we don't know who's going to do data curation yet. I absolutely believe that. I probably should have added, though, that we can have a pretty good idea who's not going to do it: anybody who isn't right this very minute planning to do it. Make no mistake, there's money (from funders and institutions) and hard-won relevance to be had in this line of work. Quite a few people and organizations are eyeing it: IT, libraries, scholarly societies, journals, entrepreneurs. If you want to get into the scrum, if you want a piece of the pie, better get your plan on now. This is no…
Steve Lawson and the LSW are three-fifths of the way to the goal of $5000 for the flood-ravaged Louisville Free Public Library by September 1. The last two-fifths are the hard part. If you can help, please do. Comment here or send me email (dorothea.salo at gmail) to let me know you've donated, and I'll do a random-number drawing for a PLoS travel mug and a size-large, never-worn PLoS One t-shirt. Thanks.
I'd like to start our tour of book and library information-management techniques with a glance at the humble back-of-book index. I started the USDA's excellent indexing course back in the day, and while it became clear fairly quickly that I do not have the chops to be a good indexer and so I never finished the course, I surely learned to respect those who do have indexing chops. It's not an easy job. Go find a book with an index and flip through it. Seriously, go ahead. I'll wait. Just bask in the lovely indentedness and order of it all. Now answer me a question: Should Google be calling that…
Hello, Monday. My tidbits folder overfloweth. Want to text-mine JSTOR? Looks like you can. Garret McMahon talks about FriendFeed, scholarly communication, and embedded librarianship. Part of the reason I'm here is that I believe, with Garret, that we librarians can't kvetch about gettin' no respect if we don't put ourselves out there in the general research scrum. It's as true locally as globally. Jason Hoyt tells scientists to step up and own their part in the dysfunctionality of scientific communication. Three cheers from this librarian! Indulge me in further Cliff Lynch adulation: check…
Well, I've been here for about a month now, and I've quite enjoyed myself! (And I finally did send in my contract, Erin. Really. I did.) Thanks to all who have commented. (Well, except a spammer or two, but I got rid of them posthaste.) You're a civil, engaged, and smart bunch, and I appreciate you very much—especially when you keep me honest. Please, if you will, introduce yourselves and tell me (and Trogool's other commenters) a bit about yourself in the comments to this post. Thanks!
Many people, first confronted with the idea of data curation, think it's a storage problem. A commonly-expressed notion is "give them enough disk and they'll be fine." Terabyte drives are cheap. Put one on the desk of every researcher, network it, and the problem evaporates, right? Right? Let me just ask a few questions about this approach. What happens when a drive on somebody's desk fails? What do we do about the astronomers, physicists, and climatologists, who can eat a whole terabyte before breakfast and hardly notice? What do we do about the social scientists, medical researchers, and…
Five years ago (really? goodness, it hardly seems possible) I gave a preconference session at the Extreme Markup Languages conference (which is now Balisage) entitled "Classification, Cataloguing, and Categorization Systems: Past, Present, and Future." I have learned to write better talk titles since then. However. The talk was actually a runthrough of library standards and practices for an audience of markup wonks. Like any field, librarianship has its share of jargon and history that legitimately seems impenetrable to outsiders. I'm going to try to reprise some of that talk here in blog…
I see a lot of metadata out there in the wild woolly world of repositories. Seriously, a lot. Thesis metadata, article metadata, learning-object metadata, image metadata, metadata about research data, lots of metadata. And a lot of it is horrible. I'm sorry, it just is—and amateur metadata is, on the whole, worse than most. I clean up the metadata I have cleaning rights to as best I am able, but I am one person and the metadata ocean is frighteningly huge even in my tiny corner of the metadata universe. So here's a bit of advice that would save me a lot of frustration and effort, and is…
The publisher Information Today runs a good and useful book series for librarians who find themselves with job duties they weren't expecting and don't feel prepared for. There's The Accidental Systems Librarian and The Accidental Library Marketer (that one's new) and a whole raft of other accidents. I suspect "The Accidental Informaticist" would find an audience, and not just among librarians. The long and short of it is, we just don't know who is going to do a lot of the e-research gruntwork at this point. Campus IT at major research institutions is seizing on the fun grid-computing work,…
I am furloughed today and going out of town, so here, have an early tidbits post. I won't be at the iPRES 2009 conference, but I do recommend looking over the program; it gives a pretty good overview of what digital preservationists think about and study, and what keeps them awake at night. (Midwesterners: the International Digital Curation Conference is coming to Chicago in 2010. I'll be there!) The strength of weak ties: why Twitter matters to scholarly communication. Spot on, and true of FriendFeed as well. This is why, privacy concerns aside, the Facebook acquisition of FriendFeed is a…
I gave a talk for PALINET some little while ago about institutional repositories. The audience had been primed by the fantastic Peter Murray to think about looking after digital content as the "fourth great wave" of library work. (I wish that talk was online. It was absolutely brilliant.) But not everyone was entirely onboard with that. I recall distinctly one distinguished-looking white-haired gentleman raising his hand. "We in libraries," he said (paraphrase mine), "have historically been purveyors of quality information. Authoritative information. On what basis should we jeopardize that…
FriendFeed, now due to be absorbed into the Borg the Facebook empire, allowed me to lurk on the fringes of the scientific community Cameron Neylon mentions in his post on the takeover. Insert all the usual clichés here: it was enormously valuable, I learned a lot, and I wouldn't have missed it for the world. My humanities training wouldn't normally gain me entrée into such a circle, and neither would my professional identity. Insofar as I have professional ambitions in scientific data management, every bit of acculturation I can come by is priceless. That community wasn't the only one I…