As a new blogger here at Book of Trogool I'd like to thank Dorothea for the opportunity to share in the discussion of evolving issues in technology, libraries, research, and scholarly communication.
I'm currently the Scholarly Communications and Library Grants Officer at Binghamton University, in upstate New York. I've been a librarian for some time (12 years now) and before that I was a chemist, with research experience in inorganic photochemistry, surface science reaction dynamics, and equine drug detection and quantification methods. While I did different experiments in each lab, each…
Praxis
Richard Wallis has taken my ribbing in good part, which I appreciate; his response is here and will reward your perusal.
He also left a comment here, part of which I will make bold to reproduce:
As to RDF underpinning the Linked Data Web - it is only as necessary as HTML was to the growth of the Web itself. Documents were being posted on the Internet in all sorts of formats well before Tim Berners-Lee introduced us to the open and shared HTML format which facilitated the exponential growth of the Web. Some of the above comments are very reminiscent of the "why do I need to use HTML"…
Richard Wallis of Talis (a library-systems vendor) posted The Data Publishing Three-Step to the Talis blog recently.
My reaction to this particular brand of reductionism is… shall we say, impolitic. I just want to pat Richard on the head and croon "Who's the clever boy, then? You are! Yes, you are!" This is terrible of me, no question about it, and I apologize unreservedly.
Here's the problem, though. Aside from my friends the open scientists (and not even all of them, to be honest), practically all the data-producing researchers I know are firmly stuck on Step 1. Firmly stuck, not to say "…
Word on the street is that the NSF is planning to ask all grant applicants to submit data-management plans, possibly (though not certainly) starting this fall.
Fellow SciBlings the Reveres believe this heralds a new era of open data. I'm not so sanguine, at least not yet. Open data may be the eventual goal; I certainly hope it is. At this juncture, though, the NSF would be stupid to issue a blanket demand for it, and I rather suspect the NSF is not stupid.
Part of the problem, of course, is that many disciplinary cultures are simply not ready for even the idea of open data. If the NSF were to…
Dan Cohen has an extraordinarily worthwhile post recounting his talk at the Shape of Things to Come conference at Virginia (which I kept my eye on via Twitter; it looked like a good 'un).
I see no point in rehashing his post; Dan knows whereof he speaks and expresses himself with a lucidity I can't match. I did want to pick up on one piece toward the end, because it has implications for library and archival systems design:
Christine Madsen has made this weekend the important point that the separation of interface and data makes sustainability models easier to imagine (and suggests a new role…
John Dupuis asks some provocative questions; I thought I'd take a stab at answering them, and I encourage fellow SciBlings to do likewise.
I quite agree with John when he says that the ferment over publishing models disguises a larger question, "the role of scholarly and professional societies in a changing publishing and social networking landscape." My own history with professional societies, I think, bears this out nicely.
John asks first: What societies do you belong to?
I belong to the American Society for Information Science and Technology. I was a member of the American Library…
One of the truisms in data curation is "well, of course we don't let sensitive data out into the wild woolly world." We hold sensitive data internally. If we must let it out, we anonymize it; sometimes we anonymize it just on general principles. We're not as dumb as the Google engineers, after all.
Only it turns out that data anonymization can be frighteningly easy to reverse-engineer. We've had some high-profile examples, such as the AOL search-data fiasco and the ongoing brouhaha over Netflix data. Paul Ohm's working paper on the topic is a great way to get up to speed.
We librarians are…
So the United Nations' Intergovernmental Panel on Climate Change is mired in a rapidly heating controversy over a report that apparently let some dubious information slip through the cracks. Here's the money quote:
The discovery of the glaciers mistake has focused attention on the IPCC's use of so-called grey literature: reports that do not appear in conventional scientific journals, and are instead drawn from sources such as campaign groups, companies and student theses. The IPCC's rules allow such grey literature, but many people have been surprised at the scale of its inclusion.
Oh my, oh…
The journal impact factor is a sham and a crock and a delusion, let's just take that as read. (If you don't care to take that as read, which is a healthy and sane attitude—take no one's word as gospel, especially not mine!—start here or perhaps here and keep going.) Using it to judge individual researchers' output, never mind the researchers themselves, verges on the criminal, is my strong belief. I'm not against heuristics, but some heuristics are plain broken, and the journal impact factor is one of those.
So it really hurts my heart to see librarians giving this flawed number credence.…
I don't hear as much curiosity from the research community as I'd like to about what a librarian knows and does, but I do hear some.
For that some, I suggest poking through the fourth annual iteration of Librarian Day in the Life. A wide variety of librarians blog, tweet, photograph, and vid about what their day is like.
Don't just pay attention to the research-related ones, either. The more people who understand in their bones what public librarians, school librarians, and special librarians add to the communities they serve, the better off everyone is, librarian and community alike.
So go…
One way and another, I heard quite a lot of talk at Science Online 2010 relevant to the interests of institutional-repository managers and (both would-be and actual) data curators. Some of the lessons learned weren't exactly pleasant, but there's just no substitute for listening to your non-users to find out why they're not taking advantage of what you offer.
In no particular order, here is what I took away:
The take-a-file-give-a-file content model for IRs is much too limited and limiting. Real live scientists are mashing up all sorts of things as they do their work; one wiki-based lab…
I wrote last week about name authority control for authors. I hinted that systems are coming. I hope that journals, databases, catalogues, and repositories adopt them when they emerge, the sooner the better.
Even when they do, though, there's an immense problem to solve, in the form of the millions (billions? I shouldn't wonder) of articles that will have to be retrofitted into the system. It's work not unlike what I'm doing at the moment, so I can say with authority (sorry, sorry) that it's often not easily accomplished.
Researchers, institutions, others, you can do some things to make the…
A common response, including in the comments at Book of Trogool, to raising digital-preservation issues is a chortle of "Guess print doesn't seem so bad now! Let's just print everything out, and then we'll be fine!"
Leaving aside my own visceral irritation at that rather rude and dismissive response—no, we won't. "Just print it out" doesn't stand up to a moment's scrutiny. Let us scrutinize a moment, shall we?
Problem number one is the variety of digital materials that become useless the instant they are printed, or cannot be "printed" at all. Hypertext. High-resolution imaging, as from…
The latest issue of the International Journal of Digital Curation is out; if you're in this space and not at least watching the RSS feed for this journal, you should be.
I was scanning this article on Georgia Tech's libraries' development of a data-curation program when I ran across a real jaw-dropper:
One of the bioscientists asked the data storage firm used by one of the labs recently about the costs associated with accessing data from studies conducted a few years ago. The company replied, "you wouldnât want to pay us to do that. It would be less expensive to re-run your experiments." (p.…
There have been a number of piercing calls for training of data professionals (of various stripes) in the last year or so. Schools of information have been answering: Illinois, North Carolina, others.
Honestly, I'm getting a sinking feeling in my stomach. If I were to label it, the label would go something like "where are all these newly-minted data professionals going to work?"
My stomach sinks worse when I realize that quite a few of the calls are coming from the same people and organizations who uttered piercing calls for the establishment of institutional repositories in the early 2000s.…
Another case of things connecting up oddly in my head—
"How do we know whether a dataset is any good?" is a vexed question in this space. Because the academy is accustomed to answering quality questions with peer review, peer review is sometimes adduced as part of the solution for data as well.
As Gideon Burton trenchantly points out, peer review isn't all it's cracked up to be, viewed strictly from the quality-metering point of view. It's known to be biased along various axes, fares poorly on consistency metrics, is game-able and gamed (more by reviewers than reviewees, but even so), and…
Some interesting ferment happening in repository-land, notably this discussion of various types and scales of repositories and how successful they can expect to be given the structural conditions in which they are embedded.
I don't blog repositories per se any more, so I'm not going to address the paper in detail (though I do think it contains serious oversights). What I'm curious about in the Trogool context is the case of institutionally-hosted services aimed not specifically at the institution, but at a particular discipline.
arXiv. ARTFL. PERSEUS. DRYAD. There's any number of these. One…
There is a certain kind of digital project that strikes terror and dismay into the hearts of digital preservationists everywhere. Not a one of us hasn't seen many exemplars. They make me myself feel sad and tired.
They're projects that, no matter their scholarly or design merit, are completely unpreservable because they were built from unsustainable tools, techniques, and materials. What's worse, even a cursory examination with an eye to sustainability would have at least signaled a problem.
It's not the unpreservability so much. It's the obliviousness that makes me hurt inside.
For various…
One phenomenon that will be—indeed, already is—utterly unavoidable in the data-curation space is the creation of standards. I once heard Andrew Pace say that standards are like toothbrushes: everybody thinks they're great, but nobody wants to use anybody else's.
Be that as it may, standards development and compliance is one way to make everybody's data play nicely with everybody else's data. It's not the only way, to be sure; one very important way that I'm sure we'll also see more of is Being The Only Game In Town. ICPSR manages this quite successfully, and so does the Digital Sky Survey. If…
If you're not reading comments here, you're missing out. For reasons I don't entirely understand, some of the best in the business are seeing fit to comment here. They have more to teach than I do!
Chris Rusbridge (of, among other things, this thought-provoking meditation on digital preservation) has been spotted here, and whenever he pops up he makes me think about things. This time, I was thinking about disciplinary expertise, and how I need to make a better case that less of it is necessary for data curation than generally admitted.
I hope we can at least admit that data curators don't…