Data Curation - notes from a local meeting

By cpikas on May 24, 2010.

My larger institution's (so not my place of work, but our parent org's) libraries had a fabulous get together Friday with a session on data curation. The speakers were: Clifford Lynch of the Coalition for Networked Information, Carole Palmer from UIUC, and Joel Bader from JHU and JHMI.

I tweeted, but there wasn't a hashtag, so there goes retrieval. These weren't live blogged but reconstructed from handwritten notes. These are my reconstructions of their points - so not my points and maybe not theirs.

Lynch spoke about institutions while Palmer spoke more about librarians. Bader spoke about his own experiences and some trickiness in his area.

Lynch
Several trends leading towards data curation:

big science, lots o' data
simulation and modeling, lots o' data as well as model output
distributed sensor systems
digital instruments for observations and experiments, lots o' data
data provides evidence and is also the product of scholarly work
within the scholarly communication system, data and databases are becoming intertwined with the literature (traditional journal articles)

Throughout scholarship we see movement towards data, curation, it's just less uniformly distributed. The issues are pervasive across, not only dealing with large data sets, but the many more smaller ones. Big science projects have budgets and data scientists and resources to deal with this stuff - it's the little guys who don't. The large data sets with established communities might also have more standards and the best solution might be disciplinary repositories at the national or international level. The small projects might fall to the institutions. That means that they fall to the library (not 'round MPOW).

Funders and journals are also becoming aware of data access issues and are starting to require archiving. (question is about compliance, too, imho). We need good IT - security, reliability, and backups - so this shouldn't be a surplus computer kicked under a desk, but using enterprise machines that are professionally maintained. Better IT and metadata at the point of capture will make everything easier later. Libraries and other institutional partners need to work with scholars throughout the lifecycle and can help with required data management plans.

But we can't keep everything. His suggestions:

is it replaceable
were human or animal subjects involved (so not ethically replaceable)
does it have personally identifiable information

Palmer
Nice quotes from Taylor (1986) - add value to information to improve current use and potential future use and Shera (1972) - coordinating and integrating information in alignment with complex social structures and practices... (what I like about her is that she doesn't give up the L and our proud tradition and values to chase the IS)... Data curation does work with our core areas (hah! she said we have core areas, hah!): information behavior, collection development and management, and information organization and retrieval. She asks if data are the new special collections? There will be more work needed on use and searching for data as well as dealing with collections of collections.

Bader
I have very few notes here because I was utterly entranced. They have some tricky issues with their data

an analysis is 100GB to 1TB
human subjects
can't anonymize - even if aggregated
has to be kept on secure servers

There are lots and lots of data standards - various slots and no way to do a database join across. Who marks up data? Who annotates? It has to be the authors because they are the ones who know - at the time of publication, but not a year later. Who assesses the annotation? Editor and reviewer for publication - but they are busy, too. It's has to be made easier for the reviewer. Who enforces? Journals and funding agencies.

More like this

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

Yeah, me too.

August 2, 2010

I'm also leaving ScienceBlogs, but it's not for the reasons some others have given. I don't think Pepsi's blog will hurt my real life reputation and besides, it's been pulled, there have been apologies - it's time to forgive. July was the first month I've gotten enough hits to get a paycheck - and…

Very cool - American Physical Society offers free access to public libraries

July 29, 2010

This APS rocks! Here's the press release from PAMnet: FOR IMMEDIATE RELEASE APS ONLINE JOURNALS AVAILABLE FREE IN U.S. PUBLIC LIBRARIES Ridge, NY, 28 July 2010: The American Physical Society (APS) announces a new public access initiative that will give readers and researchers in public libraries…

Michael Pater, Connecticut artist, died today

July 25, 2010

He was also my husband's uncle. I only found two of his images online, the remainder are photographs of prints we have on our walls - intentionally poor quality for those. He was a member of the Lyme Art Association, so there may be more information on their site. The Courant (Hartford, CT)…

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

July 25, 2010

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific…

Well, sometimes you just have to Google it

July 21, 2010

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work…