Quantization of Books 2: What Does One Sale Get You?

I've been playing around with the spiffy sales rank tracker Matthew Beckler wrote, because I'm a great big dork, and enjoy playing with graphs. Here's a graph of the sales rank vs. time through 2pm EST today (plotted in Excel from the data table at the bottom of the page):

i-0161ea59165fb0e4149e757362742da9-sales_rank_122009.jpg

As I noted in my previous post on this, the downward-going jumps are striking, and probably indicate discrete book purchase events. There also seems to be a clear trend that jumps starting at higher numbers are larger than jumps starting at lower numbers. If we assume that's the case, what does that tell us about the relationship between number of sales and sales rank?

Here's a graph of the size of the downward-going jump as a function of the sales rank at the time of the jump (I took the absolute value, so all the axes increase in the normal directions):

i-ff39944edb1bd85ce24739966bbc395e-sales_delta_122009.jpg

That's remarkably linear for such dodgy data-- astronomers and social scientists would kill for graphs this clean. If you fit a line to the whole set, the slope is around 0.42; if you throw out the four really small points in the 70,000 range, it's 0.48. So the sale of a single book (assuming those jumps are individual sales) changes a book's ranking by about 400 spots for every 1,000 places in the initial ranking. The two outliers above the line turn out to be glitches in the file, places where one point didn't get read in properly (2 errors out of 160-odd points is pretty good for Excel).

Of course, this model predicts the effect of a single sale should go to zero at a rank of about 14,000 (18,000 if you throw out the outliers), so it's obviously not perfect, but I'm amazed that it looks that linear.

What about the other direction? Here's a plot of the change in rank for an hour with no sales, as a function of the rank at that time:

i-6feb18479b565859bec9e6d129e4d1fb-no_sales_delta_122009.jpg

That's much messier, but I wouldn't expect it to be very good. Looking at the data, some sort of exponential decay with a time constant of 8-12 hours is probably a better model. That sort of data massage is not something I know an easy way to do in Excel, though, so it'll have to wait until I'm at work and have access to SigmaPlot.

More like this

Very interesting. I can imagine a better understanding of sales rank improvements could lead to a publisher being able to strategically make multiple purchases of a book to push it into a higher sales rank where it "goes viral" (I hate that term but you know what I mean: more people buy it because they see other people are buying it).

BTW, I'm pretty sure I see the Flying Spaghetti Monster With Outstretched Tentacle in that last chart. If it makes a difference.

For some reason, having a higher rank be at the bottom of the chart is making my head hurt. Or maybe that's just this lingering cold.

By Aaron Bergman (not verified) on 20 Dec 2009 #permalink

Yeah, I debated flipping the vertical axis, but I've gotten used to the idea of down being better over the last week or so.

What I'm really hoping is that at some point I'll have a reason to use a semi-log plot...

I wonder how close 18,000 is to the actual rank at which you wouldn't necessarily expect a single sale to change your rank (because the difference between adjacent ranks is more than 1 sale).

I would say that the bottom graph is clearly fitted by an "elephant" model. Just look at it!

The "elephant" is striking. But do you need a fifth parameter to fit the trunk wiggle?