As I watch the environment around me for signs of data curation inside institutions, particularly in libraries, I seem to see two general classes of approach to the problem. One starts institution-wide, generally with a grand planning process. Another starts at the level of the individual researcher, lab, department or (at most) school; it may try to scale up from there, or it may remain happy as its own self-contained fief.
As with anything, there are costs and benefits to both approaches.
Some of the challenges of data-driven research carry costs and infrastructure that only make sense on an institutional level at this juncture. Grid computing. Gigantic, well-managed disk. (Gigantic disk is fairly cheap. Gigantic well-managed disk will cost you. In my mental model of the universe, I include such things as periodic data audits and geographically-dispersed backups in the cost of disk.) Authorization and authentication, which is a bigger problem than you might think. Carrots and sticks, if the institution is serious about this.
So it makes a certain amount of sense to try to tackle this problem as an institution. Where the institutional model falls down, I begin to suspect, is service beyond the bare provision of appropriate technology. Training and handholding. Outreach. Help with data-sustainability plans in grant proposals. Whipping data into shape for the long term. Advice on sustainability, process, documentation, standards?the nuts and bolts of managing data in a particular research enterprise.
Because data and their associated problems are as varied as the research that create them, I just don’t think it’s possible to open a single-point-of-service “data curator” office and have that be an effective solution (save perhaps to extremely small, targeted problems like grant proposals). I do still believe that almost any reasonably bright, decently adventurous librarian or IT professional can walk into almost any research situation, get a read on it, and do good things for data. I’ve seen it happen! But the “getting a read” part takes time and a certain level of immersion. How can a single point of service, whose responsibility is to the entire institution, spend that much effort targeting specific research groups?
Simple. It can’t. Moral of the story: data curation is not a Taylorist enterprise.
In practice, I suspect, institutions that create the Office of Data Curation without carefully considering what I just outlined will inexorably wind up serving only a small proportion of the institution’s researcher population. It’s quite likely to be the proportion of said population swimming in grant money and prestige, of course. The arts, humanities, and qualitative social sciences are most liable to be left hanging. I already see this happening one or two places I know of?not because they have bad or thoughtless people, not at all, but because good people have been handed an organizational structure ill-suited to the task at hand.
Can such a structure be made workable? Perhaps. It’d take some work from the grassroots. Were I in that situation, I’d be canvassing my campus for every single person on it?librarian, IT pro, grant administrator, researcher, graduate student, whoever?who “does data” in some way. Then I’d be working like crazy to turn them into a community of practice.
I admit I’m a little hazy on how communities of practice form and how they can be encouraged to form; I’m sure there’s research on the subject (and would appreciate pointers to same). I must also admit that I’ve tried multiple times to form one around institutional repositories and quite resoundingly failed.
I can only say based on those failures that much depends on what the community-former has to offer, as well as how ready putative community members are to consider themselves part of a coherent community. In this case, how well would it work? I don’t know. I’d want something fairly compelling to offer, to get the ball rolling?perhaps some of those institution-wide resources.
About data fiefs I don’t have much to say. They exist already, notably in the quantitative social sciences. They seem to work quite well from a service perspective. Unfortunately, some of their technology practices, especially around data sustainability, set my teeth a bit on edge. Format migration? Audits against bitrot? Standards? Persistent, citable URLs for public data? Not so much, some places. And let us not even discuss what happens when the grant money runs out. These places usually aren’t geared for the long term, though they do quite well in the medium (say, five to twenty-five years) from what I’ve seen.
If you think I think there’s a sweet spot somewhere in the middle here, you know me entirely too well. At least some of the outlines of the ideal state seem clear: where the rubber meets the researcher, local staffing and control; where the problem goes beyond what local can responsibly or effectively manage, the institution steps in. Likewise, the institution has a responsibility to researchers who need data help but can’t afford it locally, in their lab or school or department. There should not be coverage gaps.
By the way?there is, in fact, one organization common on research-university campuses that has learned to be (more or less) centralized while still providing discipline-aware, often discipline-specific, services. It does rather remarkable work serving all campus disciplines, as fairly and skillfully as an unjust world permits. A way out of the Taylorist paradox, perhaps!
What is this wonder organization? It’s called “the library.”