Posts I should write: Google Books metadata

So anyone who's spent any time at all with Google Books (hence forth GB), has probably noted some really bizarre - I mean truly strange -  metadata. Like messed up titles, authors, publication years, oh and categories are totally hit or miss. I frequently take for granted that everyone has seen all of the memes that go around in the library web 2.0 circles. But that's crazy of course. So I'll just throw this at you scattershot.

At a meeting at the Berkeley iSchool on the GB settlement (and that's another thing I should blog about but don't have time for the research needed), linguist Geoff Nunberg tore up GB for this. See his blog post and then the pdf of his slides.

Google responded. One of the arguments is that their data providers gave them crap - or at least conflicting ok stuff. Heh. If you mashed up every US and UK academic library catalog you would still have better metadata than they have and they only had to pick one library (the originator) for each scan and then map the LCSH to the BISAC. Seriously.  Like certain fields would be weird, but we've had machine readable standardized records for decades and decent cataloging for decades before that. And they have the whole load from our massive union catalog, WorldCat.

I mean, that doesn't stop me from using it, but I'm just using for natural language full text searching and linking out from my library's catalog, which is cool. linguists apparently thought they could rely on the metadata when using it as a corpus for analysis.

More like this

Seems like I was at the wrong session at the wrong time - I missed Bilder's comments and others that have traveled widely on twitter. Search for #ssp09. The opening keynote today was by the current head of href="http://www.arl.org">ARL, the Association of Research Libraries, Dr Charles B.…
Matthew Herper rounds up some of the discussion about the decreasing cost of genomics. But one thing that hasn't been discussed much at all is the cost of all of the other things needed to make sense of genomes, like metadata. I briefly touched on this issue previously: A related issue is…
We have a guestblogger today! At my request, Peggy Schaeffer kindly sent me the following introduction to Dryad, which I reproduce as I received it (save for minor formatting details). I will happily pass any questions in the comments on to Peggy for response. ---- Dryad is a repository for data…
Via Rag & Bone Blog By Christopher Tovo Are we falling out of love with books? I realized a little while ago - when yet another book arrived from Amazon and was thrown on the to-read pile - that I'm no longer the bibliophile I once was. I love the idea of reading books, but I'm not making time…