Now on ScienceBlogs: HeartlandGate: Anti-Science Institute's Insider Reveals Secrets

ScienceBlogs Book Club: Inside the Outbreaks

Christina's LIS Rant

This is my blog on library and information science.

Profile

Christina Pikas Christina K. Pikas is a science and engineering librarian in a special library as well as a doctoral student in information studies.
Any opinions expressed here may not even be her own and certainly do not represent those of any organization willing to be affiliated with her.

Search

Recent Posts

Recent Comments

Archives

Geography

Locations of visitors to this page

Where am I?

N 39 W 76

Research Blogging Awards 2010 Finalist

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0.

« Can’t a machine do that? | Main | Anybody can code »

Posts I should write: Google Books metadata

Category: Posts I would like to write
Posted on: September 11, 2009 10:28 PM, by Christina Pikas

So anyone who's spent any time at all with Google Books (hence forth GB), has probably noted some really bizarre - I mean truly strange -  metadata. Like messed up titles, authors, publication years, oh and categories are totally hit or miss. I frequently take for granted that everyone has seen all of the memes that go around in the library web 2.0 circles. But that's crazy of course. So I'll just throw this at you scattershot.

At a meeting at the Berkeley iSchool on the GB settlement (and that's another thing I should blog about but don't have time for the research needed), linguist Geoff Nunberg tore up GB for this. See his blog post and then the pdf of his slides.

Google responded. One of the arguments is that their data providers gave them crap - or at least conflicting ok stuff. Heh. If you mashed up every US and UK academic library catalog you would still have better metadata than they have and they only had to pick one library (the originator) for each scan and then map the LCSH to the BISAC. Seriously.  Like certain fields would be weird, but we've had machine readable standardized records for decades and decent cataloging for decades before that. And they have the whole load from our massive union catalog, WorldCat.

I mean, that doesn't stop me from using it, but I'm just using for natural language full text searching and linking out from my library's catalog, which is cool. linguists apparently thought they could rely on the metadata when using it as a corpus for analysis.
Share on Facebook
Share on StumbleUpon
Share on Facebook
Find more posts in: Information ScienceTechnology

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/119768

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.