Will Google censor its digital library?

I promised you some updates on the Google Books Settlement, so here you go. Things are definitely getting interesting.

First, I mentioned earlier that I was going to attend a panel on the Google Book Search Settlement here in DC, featuring representatives of Google, the publishers, and the Internet Archive. ITIF, which organized the panel, has made the entire thing available online; I've linked to it at the bottom of the post, because it's over an hour long.

Anyway, it was interesting to hear the (very civil) differences of opinion between Dan Clancy, the Engineering Director for Google Book Search, Allan Adler of the Association for American Publishers (whose organization, along with the Authors Guild, originally sued Google back in 2005), and Peter Brantley of the Internet Archive (best-known for the Wayback Machine, they want orphan books out of the Google settlement). Alan Inouye of the American Library Association, while less vocal, brought up the disconcerting point that while this settlement will significantly impact how libraries and their users can access books digitally, libraries were not a party to the lawsuit or settlement.

Despite the fact that Adler's organization originally sued Google, they are definitely on the same page now. Clancy described the settlement as a "win-win-win", and Adler said that "we feel like we've done some good in this settlement agreement." Brantley obviously disagreed; his main complaint was that the Internet Archive wants orphan books out of the settlement.

As I mentioned before, a lot of the trouble with the Google settlement hinges on those pesky "orphan books." Here's how New York Law School associate professor James Grimmelman, who is filing an amicus curiae brief on the settlement, defined orphan books in a recent interview with the Fiction Circus:

An orphan book is one whose copyright owner can't be found by someone who wants to make a use of it and ask permission. It's just that we can never PROVE that a book is an orphan. You can prove that a book's not an orphan because you've got somebody jumping up and down and saying "hey, I own the copyright! I'll sue you if you do anything!" But you can't tell for certain that something IS an orphan, because the author or publisher might crop up unexpectedly. (source)

In this regard, orphan books like hypotheses - you can prove a hypothesis wrong, but you can never prove it right. That means that all the speculation about how many books in Google's 7-million-strong inventory are orphans is just that, speculation. Under the terms of the settlement, Google gets custody of the content of those orphan books, simply because there is no author or publisher to formally withdraw them from Google's inventory. While each orphan book is likely to be worth very little, all together they add up to a "long tail" of potential profit for Google (and for the authors and publishers of the copyrighted books, who stand to split the profits).

Congress has actually been looking at legislation to deal with the problem of orphan works - Allan Adler said ruefully at the ITIF panel that "I've been working on orphans works legislation for three Congresses now." Clancy said that Google would support anything to bring broader access to orphan works, including legislation. Bentley said the Internet Archive also supports such legislation, and wants it broadened to cover photos, illustrations, etc. - which makes a lot of sense, since image use on the internet is a really murky area. But it's unclear when Congress will get around to dealing with this. (You can read more about it here.)

Anyway, in his Fiction Circus interview, Grimmelman also brings up another alarming possiblity, that Google could edit the content of the book inventory, removing titles it finds "inappropriate," much as it regulates YouTube:

If Google makes an editorial decision to remove a book, they are going to be public about it. They are going to tell the the Registry and the general public: "We are not including this book, because we don't want to include it. We think it is inappropriate."

The settlement doesn't enumerate valid reasons for removing a book from the inventory - one hopes Google would learn from Amazon's recent black eye and exercise restraint, but that's just a hope. Google will have complete power over the content of their library, and power to remove books they don't want. This is disturbing because as long as Google is the dominant digital library provider, there will be no way for the public to get the "inappropriate" books - or even to know they have been excluded from book search results in the first place. The books could become invisible to anyone using Google's database. Those books still under copyright, which belong to authors and publishers, might be made available elsewhere, but those pesky orphan books couldn't be digitized without the risk of a lawsuit - even if Google doesn't want them.

You can read Grimmelman's complete interview with Fiction Circus here. Note that Grimmelman is not unbiased. He opposes the Google settlement as written, saying "I think it does some good things, but I think it does them in a pretty sketchy way," and the amicus brief he is writing in this case is being funded in part by Google's frenemy Microsoft. Which is where Grimmelman used to work as a programmer. (What a tangled Web we've woven!)

Because the Google settlement arose from a class action lawsuit, and because there is no one who could grant permission to a third party to include orphan books in a competing digital library, some people are complaining that the settlement gives Google an effective monopoly. And sure enough, the NYT just announced that DOJ is investigating the settlement for possible antitrust violations. That doesn't mean they will necessarily block the settlement, just that they're investigating - which given the level of buzz about this is not too surprising. But still, Microsoft must be pretty amused.

Finally, the whole settlement has now been delayed several months - which some pundits are spinning as a win for the settlement's opponents:

"The four-month extension is a big victory for those who oppose the Google Books settlement," said John Simpson, a consumer advocate with Consumer Watchdog. "It's a clear recognition by the judge that there are problems with the proposed deal. The extension also gives the Justice Department more time to consider the antitrust issues that we and others have raised and discussed with them."

At any rate, it looks like discussion of the settlement will continue for most of this year.

Here's the complete panel video from ITIF:

More like this

I haven't yet listened to the panel discussion, which I may or may not do while I am trying to get the last ten or so pages of my final paper finished - depends on how distracting it gets. But I do have a quick comment now (not sure when I will get back to this discussion).

I think the discussion that Greg Laden started about Amazon, is probably more relevant to Google, than it is to Amazon. I am not entirely sure how I feel about the idea of treating what amounts to an online department store, like a public utility. But while I do have similar concerns about doing that to Google, I think that a reasonable argument can be made as to the parallels. And I think that settlements like the one in question at the very least leave Google with certain responsibilities to the public.

One of the biggest reasons I tend to lean towards calling this a net positive, is that I'm pretty certain that Google is just about the only company with the capacity to actually build such a comprehensive digital library. And while I am a die-hard bibliophile and absolutely love the tactile experience of actual paper books, I also see a great deal of value in digital collections.

It would make my life considerably easier, for example, if I could get an e-version of hard copy books that I buy. It just seems silly to me, that I have to type out citations, when the book probably exists somewhere in digital form. And given that I've already purchased the damned thing, there is no reason that I shouldn't be able to access such a copy.

Even more compelling for me though, is the idea of buying a decent e-reader and getting all my text books in digital format. It would be easier, cheaper and would save the trouble of selling back books or feeling the need to part with them at all. And putting those together with various journal articles and supplementary texts, into a single, personal database would be of huge value to me.

I don't know. I have a lot of mixed feelings about this and definitely need to consider it in more detail, but I am inclined to think that this is overall a net positive - but one that very definitely should be subject to some regulation. I'm not even averse to seeing legislation that is entirely focused on regulating how Google manages it's digital library. While I am generally against such specific legislation, I think that the sheer scope of this project makes it entirely reasonable.

Given the scope of the project some federal regulation may be necessary. I certainly would not want to see Google become the world's library only to start deciding that some books are not appropriate for users to see. Even though Google is a private enterprise, I think it ought to operate under similar rules to the public libraries that it will likely replace.

It occurred to me, on my way home from my math final, that I asserted Google has certain responsibilities to the public, without explaining what I believe they are.

I tend to think that first and foremost, in light of the settlement, Google has a responsibility to make sure that the public has access to any materials under their control - that is an absolute and I wish the settlement reflected that. I also tend to think that given the claims they make to justify the settlement (which I happen to agree with), they have a responsibility to make these materials available through their digital library.

On the chance that they would run into issues of space, I think they need to go genre to genre and prioritize the most relevant, important works, rather than using those hypothetical space restrictions to censor work that is "inappropriate." This is important because there are all sorts of potential ways to create filters that will restrict access to materials that individuals would find objectionable. Indeed, I could see institutions and organizations creating filtered portals to satiate the fears of a host of different sorts of people, that they or their children might be exposed to dangerous ideas they disapprove of. Meanwhile, those of us who aren't fucking morons, terrified of things unfamiliar/disagreeable and/or talking to our children about them, would have the freedom to search through all sorts of material.