How Many Books Is That?: Modeling Amazon Sales Rank

By drorzel on April 22, 2011.

A few months ago-- just before the paperback release of How to Teach Physics to Your Dog-- Amazon started providing not only their Sales Rank data, but also sales data from Nielsen BookScan. Of course, the BookScan data is very limited, giving you only four weeks, and the Sales Rank data, while available over the full published life of any given book, are presented as a graph only with no way to extract them as a data table. You'd have to be some sort of obsessive nerd to make a quantitative comparison between them.

So, anyway, here's the data I got for How to Teach Physics to Your Dog:

This is a graph of BookScan sales data for each week vs. the average Amazon Sales Rank for that week. Blue points are the hardcover for How to Teach Physics to Your Dog, brick-red the paperback. The one lonely point in the upper left is the sales figure for the first week of the hardcover sales, which a little bird provided for me at some point (I don't have BookScan access outside of the Amazon service).

This is a log-log plot, so those straight lines fit to the data are power laws. According to this, to estimate the number of copies the hardcover has sold based on its Sales Rank, you would use the formula:

N(sold) = 35900000R^-1.34

Where R is the Sales Rank. This is obviously a little dodgy, given that there's only that one little point floating way up on the left, and to do a better job, I'd need to have lots more points with high ranks and large numbers of copies sold. If you want to go out and buy several hundred paperbacks of How to Teach Physics to Your Dog from Amazon over the next couple of weeks, I'd be thrilled to have the data...

So, anyway, what good is this? Well, I know exactly how many paperbacks have been sold, thanks to Amazon's BookScan numbers, but I don't have the numbers for the hardcover. I do, however, have reams of data from the nifty Sales Rank tracker Matthew Beckler wrote for me (which stopped working a while ago, but has been supplanted by the Amazon features, anyway.

So, using that handy formula from the graph above, and the several months of sales ranks from the tracker, what can I say about the number of books sold? Well, after laboriously converting the ranks to approximately the same format as the data from Amazon, I can plug in the average sales rank for the hardcover for every week it was out, and use that to estimate the number of copies sold. The resulting data look more or less like this:

This gives you a rough idea of the number of copies a moderately successful pop-science book sells in the US. What's the total sales figure? That's a little tough to say, because the fit to the first graph is still fairly sensitive to exactly what data are included--the two fit parameters wander around a little, and because it's a power law spanning a couple of orders of magnitude, small changes in those values can lead to substantial changes in the estimated total.

With that caveat, the estimated value looks to be between 4000 and 5000, probably toward the high end of that range. Now, comparing the fit function result with the data I have from Amazon suggests that this estimate is low by about 10% (this works for both the hardcover and the paperback numbers, and for several different subsets of those data). Using 4800 as the estimated number, that extra 10% gets you to 5280 books (one for every foot of a one-mile stretch...). BookScan estimates that it covers about 75% of all sales, so that would put the total at right around 7000.

How good an estimate is that? I can't really say, since I don't have BookScan numbers, and my royalty statements only give the number of copies shipped out to stores, not the copies actually sold to consumers. It's in the right ballpark, at least-- it's less than the total number printed and shipped, for example-- but beyond that I don't have any useful information.

This was an amusing way to spend a few hours crunching numbers, though.

(Interestingly, the UK edition has shipped more than twice the number of copies the US edition did, and has been on one British chain's bestseller list for a while now. Which just goes to show you the strong random element involved in the publishing business...)

The biggest weakness of this model is really the lack of data at high ranks and high sales-- there's just the one point fixing the top end of the power law, so adding new data points can swing the total value from the fit by a few hundred copies one way or the other. If I had more date in that region, the fit would probably be more stable, and the estimate better.

So, again, if anybody would like to buy several hundred copies of the paperback edition from Amazon over the next several weeks (you could hand them out on the subway...), to boost the sales ranking and get me some better data, feel free. Of course, to keep the BookScan ratio about right, you'll also want to buy several hundred from your local big-box chains... But it's for SCIENCE!, so it's all good...

More like this

"(Interestingly, the UK edition has shipped more than twice the number of copies the US edition did, and has been on one British chain's bestseller list for a while now. Which just goes to show you the strong random element involved in the publishing business...)"

I'll take issue with that. The Brits are very much more interested in physics than us Yanks. They have Newton and Hawking, and those few who know Rutherford and Dirac, and they're about 5 times more civilized than us Americans. The European mindset is that education is good in its own right, the American being it's what you have to do to get a diploma or a degree, dangit.

We all like dogs, but the science is the thing. Don't be surprised if your next book about relativity and Einstein does very well in Germany. Just a prognostication, we shall see.

Well, I bought your book the week it came out, from Amazon, and based on Woit's review, so I see my data point. ;-)

Hey there,

I just finished my qualifying exams this past week, so I suddenly have ample spare time on weekends to update sales-rank trackers for physics books involving nice dogs. I upgraded the page-scraping script to be python instead of bash, so I can use the nice lxml xml-parsing library. There isn't much new data yet, but I've restarted the cron task and it should be gathering data every hour, starting now. Enjoy!

http://www.mbeckler.org/dog_physics/

--
Matthew Beckler

Here's another related suggestion: You should open an e-bay account, and offer a copy of your book at a ridiculously high price. And then watch other sellers offering it at 1 penny less. See http://www.michaeleisen.org/blog/?p=358
( via
http://delong.typepad.com/sdj/2011/04/disequilibrium-economics-algorith… )

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Universities Can Agree On All Hate Speech Except Antisemitism

More by this author

Go On Till You Come to the End; Then Stop

October 31, 2017

ScienceBlogs is coming to an end. I don't know that there was ever a really official announcement of this, but the bloggers got email a while back letting us know that the site will be closing down. I've been absolutely getting crushed between work and the book-in-progress and getting Charlie the…

Meet Charlie

October 30, 2017

It's been a couple of years since we lost the Queen of Niskayuna, and we've held off getting a dog until now because we were planning a big home renovation-- adding on to the mud room, creating a new bedroom on the second floor, and gutting and replacing the kitchen. This was quite the undertaking…

Physics Blogging Round-Up: August

September 1, 2017

Another month, another set of blog posts. This one includes the highest traffic I think I've ever seen for a post, including the one that started me on the path to a book deal: -- The ALPHA Experiment Records Another First In Measuring Antihydrogen: The good folks trapping antimatter at CERN have…

The Age Math Game

August 22, 2017

I keep falling down on my duty to provide cute-kid content, here; I also keep forgetting to post something about a nerdy bit of our morning routine. So, let's maximize the bird-to-stone ratio, and do them at the same time. The Pip can be a Morning Dude at times, but SteelyKid is never very happy to…

Kid Art Update

August 13, 2017

Our big home renovation has added a level of chaos to everything that's gotten in the way of my doing more regular cute-kid updates. And even more routine tasks, like photographing the giant pile of kid art that we had to move out of the dining room. Clearing stuff up for the next big stage of the…