L'esprit d'escalier

By dsalo on October 29, 2009.

If you're not reading comments here, you're missing out. For reasons I don't entirely understand, some of the best in the business are seeing fit to comment here. They have more to teach than I do!

Chris Rusbridge (of, among other things, this thought-provoking meditation on digital preservation) has been spotted here, and whenever he pops up he makes me think about things. This time, I was thinking about disciplinary expertise, and how I need to make a better case that less of it is necessary for data curation than generally admitted.

I hope we can at least admit that data curators don't have to be researchers themselves. Do researchers have to be involved in the curation of their own data? Absolutely! Data curation starts at the beginning of the study-design process, and continues all the way through and past publication. But that doesn't mean that researchers have to do everything. The exact division of labor is still being sorted out; that's partly what this blog is about. That the labor must and will be divided appears to be beyond dispute.

The corollary to this is that a data curator will almost always know less about the data, viewed from certain axes, than the researcher does. She may well know more about it viewed from some other axes—file format details, metadata crosswalking, whatever. Some things, though, she won't know and presumably won't have to.

So what does she have to know about the research and the discipline in order to be a responsible data steward? And does she have to walk into the process with that knowledge pre-existing, or can she learn it as she works on the research project? How much of what she needs to know will transfer from other projects she's worked on?

Cards on the table: in the absence of much evidence either way, I think that someone with the intelligence, disciplinary background, and intellectual curiosity of a good subject-specialist librarian can learn enough "on the job" to hit the 80/20 point pretty easily—and 80/20 is more than good enough for a successful campus data-curation program in my book. The other 20% of edge cases can hire specially.

I'll use a True Story about myself as an anecdote. Feel free to quarrel with me (civilly, please) in the comments.

Some years ago I did a small contract job for the ACLS E-book project. They were working on rekeying and marking up an art-history book with extended segments of polytonic Greek text. Their keying vendor took one look and said "no way do we key polytonic Greek." So ACLS told them to key the rest of it and leave placeholders for the Greek. They came to me asking whether I could key the Greek in proper Unicode without snarling up the markup.

I have never studied Greek. I do not speak Greek. I do not write Greek. I do not read Greek, except in the sense that I recognize the letters and can laboriously sound them out. Don't ask me what in the world the accents and squiggly bits in polytonic Greek mean; I haven't the slightest clue.

Not snarling up markup? That I can manage. After an hour or so of research, I found fonts and tools that could enable me to do the keying job correctly and with reasonable efficiency. ACLS and I agreed on a price, and off I went. I didn't know what the squiggles meant, but I could reproduce them, and that was plenty good enough.

When it came time to proof my work, I didn't rely just on my own eyes; that would have been stupid. I called in my classics-major husband. He found typos and the odd homeoteleuton, which I duly fixed up. I sent the result back to ACLS, and they were happy enough to pay me, so there that is.

And there we have it: a partnership between a tech geek and a reasonably well-trained domain specialist (kindly note that my husband was an undergraduate classics major) took care of a data job. I think this can happen more often in more fields.

The chief barrier is the belief that it can't.

More like this

There's a wonderful story that I read somewhere about necessary skills and what is lost when processes are automated that is absolutely germane to your example, if not the topic at hand.

Apparently many years ago a a Large University Press, they were typesetting a Greek text of some sort. At the time, setting the Greek type was still a manual process. During the process, the (human) typesetter stopped working and informed his supervisor that there was a typo in the manuscript, and that the author needed to be contacted. After much back and forth, the author was contacted, he looked at the manuscript, and agreed that there was a typo, and he fixed it.

The typesetter, who had no background in Greek of any sort whatsoever, was able to identify the error because in all his years of setting Greek type, he had NEVER placed those to particular characters together in a word.

I think you are absolutely right. I speak as someone completely unqualified for the job, with no computer science qualification, no library science qualification and (ahem) no research qualification. Amazing where a crumby physics degree can get you!

That said, I do get annoyed at some librarian assumptions that, because they know metadata, they can do data curation. I think the key is the intellectual curiosity - and humility - you imply.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

We're moving!

August 3, 2010

Looking for us? We're happy to say that we're part of the new Scientopia blogging collective. Come see us there!

Belated Zombie Day post

July 13, 2010

Oh, if I'd only had this picture for Zombie Day... Credit for the photo to UK Serials Group. Credit for the alteration of the speech bubble (you can see the original slide here if you care to) to Steve Lawson. Incidentally, I should have a postprint of an article based on this presentation up…

Promoting a comment: "Open and shared format"

July 8, 2010

Richard Wallis has taken my ribbing in good part, which I appreciate; his response is here and will reward your perusal. He also left a comment here, part of which I will make bold to reproduce: As to RDF underpinning the Linked Data Web - it is only as necessary as HTML was to the growth of the…

Small fry, blogging networks, and reputation

July 8, 2010

So, the PepsiCo blog thing. Right. Advance disclaimer: this is me talking, not either of my illustrious co-bloggers. We have not yet made a decision about what to do; one co-blogger is across the pond at a conference and the other is vacationing, so that discussion will have to wait a bit. This is…

I'd love to dance with you, but...

July 6, 2010

Richard Wallis of Talis (a library-systems vendor) posted The Data Publishing Three-Step to the Talis blog recently. My reaction to this particular brand of reductionism is… shall we say, impolitic. I just want to pat Richard on the head and croon "Who's the clever boy, then? You are! Yes, you are…

More like this

We're moving!

Belated Zombie Day post

Promoting a comment: "Open and shared format"

Small fry, blogging networks, and reputation

I'd love to dance with you, but...

The Camera that Changed the Universe: Part 3

Weekend Diversion: Ten things a Dad told his children (Synopsis)

In which Joe Jackson's wisdom about cancer is apparently not validated