Open Data & The Panton Principles: Thoughts on a presentation to librarians

As I mentioned last week, on Tuesday, April 17 I was part of a workshop on Creative Commons our Scholarly Communications Committee put on for York library staff. My section was on open data and the Panton Principles. While not directly related to Creative Commons, we thought talking a bit about an application area for licensing in general and a specific case where CC is applied would be interesting for staff. We figured it would be the least engaging part of the workshop so I agreed to go last and use any time that was left.

Rather unexpectedly, the idea of data licensing and in particular CC0 licensing for data ended up being the topic that most energized the crowd! So we bumped up my part and I ended up going second-to-last. My section sparked a lot of very interesting conversations and feedback from a pretty packed house.

So much so, that while riding home on the bus on Friday with a colleague, she mentioned that the issues I'd talked about on Tuesday had come in handy at the conference she'd attended on campus earlier on Friday. She'd been able to speak intelligently and provocatively about the usefulness of open data to public policy!

So, a huge win.

Lessons learned? I think if I were doing the presentation over again tomorrow, I'd emphasize the practice of making data openly accessible should be considered as outside the normal scholarly communications system. It isn't just for pirates and thieves. The goal is to make data sharing a standard practice. The means to that end is to ensure data sets are cited in the literature and by extension to have data sharing become an accepted part of the normal academic reward and incentives structure. You create data, you share it, someone else uses it for their research, they cite your data set in their paper, that citation is counted with the same weight as a citation to a paper.

And within that understanding, I think I also would have emphasized more that it's just the right thing to do. Sure, you can fear being scooped with your own data, that someone will replicate your claims and try and take the credit, sure someone might even try and claim that they created your data themselves. But these "risks" should be seen as no different from the risks of publishing anything -- a journal article, a blog post, some code.

But those are far outweighed by the great potential of making scientific data open.

In any case, that's for next time. And hopefully there will be a next time. We're definitely hoping to take our workshop to a conference somewhere.

I don't believe any of the other presentations by my colleagues are online, but I'll link to them here if I find them.

And speaking of presentations, here are my slides:

I'll note that I'm waiving all rights to the slides and releasing them with a CC0 waiver. So have at them!

Also, here are the resources I used for my presentation as well as a few more that have come to light since my original post.

And some new ones:

As before, any suggestions for further resources would be greatly appreciated in the comments.

Categories

More like this

I wholeheartedly endorse the goal of making open data a part of standard practice in order to promote scientific progress, but I think many of these discussions miss the importance of open data on primary peer review. Reading about what the data are and what the analysis shows is no substitute for actually examining and analyzing the data. The recent Duke cancer genomics scandal (for one summary see http://pipeline.corante.com/archives/2012/04/10/biomarker_caution.php) is a case where primary peer review in multiple high profile journals and clinical trial reviews did not catch major problems. If the research community is going to argue for increased public funding in a time of decreasing public budgets, it seems necessary to work to make primary peer review capable of catching these problems BEFORE the clinical trials start.