How Good Is BookScan Anyway?

One of the big stories in genre Internet news was Seanan McGuire’s post last week, about reactions to the early release of some copies of her book, and the hateful garbage thrown her way by people outraged that the ebook didn’t slip out early as well. And let me state right up front that the people who wrote her those things are lower than the slime that pond scum scrapes off its shoes. That’s absolutely unconscionable behavior, and has no place in civilized society.

That said, Andrew Wheeler picked up on something that also struck me as odd, namely the way McGuire was so upset about paper copies of the book being sold before the release date. Wheeler does a nice job, using numbers from Nielsen BookScan, of showing exactly why this might matter: McGuire’s past sales suggest that, if everything broke just right, rapid sales in the first week could put her book on the extended New York Times bestseller list, which is a Big Deal. That would require, however, that her book sell a lot of copies in the first week, which is hurt by having some copies slip out a week early. So there’s some reason why she and her publisher should be concerned about the early release.

Of course, this relies on BookScan, which is an imperfect measurement– Wheeler includes the usual explanation: “BookScan captures, by general consensus, somewhere from 2/3 to 3/4 of the book outlets in the USA.” But exactly how good a measure is it? which leads to this graph:


This shows the sales for the trade paperback of How to Teach Physics to Your Dog, normalized so as to obscure the proprietary values, for a period of several weeks. Black circles are numbers from BookScan (which I’ve used before in modeling sales), green triangles are point-of-sale values provided by Scribner, which capture all the books sold.

This shows more or less what you’d expect: the two track each other pretty well, with the BookScan numbers generally a bit lower (there’s one point that’s actually higher, which I think happened because I miscopied the number and included some hardcover sales). The big spike in the data is the week before Christmas, with sales almost five times the average of the other weeks.

This is, however, subject to some rather stringent limitations.

For one thing, this is only the trade paper edition, which is the one currently in print from Scribner. I don’t have point-of-sale numbers on a week-by-week basis for the hardcover, because they only started providing those last fall. This is also a book that has been out in one form or another since December 2009, which means that the sales pattern is very different than for a new release.

So, there’s no way to draw grand, sweeping conclusions about all of publishing from this one graph. But it’s interesting to me in a playing-with-graphs kind of way, because I’m that sort of geek.

For the record, in this specific data set of very narrow applicability, the BookScan number averages 83+/- 14% of the point-of-sale value. I suspect I’m selling more through Amazon (which is included in BookScan) than the sort of stores that it misses, which would account for the higher-than-expected fraction. But it’s in the right ballpark for what everybody says about BookScan.

Also, if you care, the total number of e-book versions of How to Teach Physics to Your Dog is about 54% the number of trade paperbacks, but there’s a huge caveat to that number, given that the e-book version went on sale in December of 2009, simultaneously (in some reference frame) with the hardcover, while the trade paper came out a year later. I don’t have hard numbers for the hardcover sales– I have royalty statements, but they’re cryptic and report warehouse orders rather than sales– so I can’t say how the e-book compares to the total, but my guess is that the overall fraction is more in the 15-20% range.

I also can’t promise to do a similar write-up for the new book, because I don’t have access to the point-of-sale numbers– Basic doesn’t have the same reporting system that Simon & Schuster is now using. If I get them at some point, I’ll do a similar comparison, but that’ll be many weeks from now.

So, to recap: highly limited data show a good correlation between BookScan and real sales figures, for the specific case of one edition of one book that’s been out for a couple of years. Also, anyone sending hateful email to Seanan McGuire is vermin.