Are Books and Kindles Correlated?

I'm trying not to obsessively check and re-check the Dog Physics Sales Rank Tracker, with limited success. One thing that jumped out at me from the recent data, though, is the big gap between the book and Kindle rankings over the weekend. The book sales rank dropped (indicating increased sales, probably a result of the podcast interview), while the Kindle rank went up dramatically. This suggests that people who listen to that particular podcast are less likely to buy new books on the Kindle than new books on paper.

This got me wondering, though, whether this was an anomaly, or a general truth. That is, is there any correlation between the sales rank of the paper edition of a book and the sales rank of the Kindle edition of the same book? Happily, the sales rank tracker spits out all the hourly rankings in a nice table that I could copy into SigmaPlot and crank away on, producing the following:

i-68b737e0b8a6e49c33545a2f1d38ecf9-bookvskindle2.PNG

This is a plot showing the Kindle sales rank of How to Teach Relativity to Your Dog (vertical axis) versus the sales rank of the paper edition (horizontal axis). I smoothed the hourly data a bit, averaging together five hours, because it's really noisy, but that makes almost no difference.

What does this say? Well, that there's a pretty weak correlation between them. The data points fall more or less in a wedge extending up and to the right, which tells you that when one is really high, the other tends not to be very low, and when one is low, the other also tends to be low, but the relationship between them is pretty weak. At a book rank of about 25,000, the Kindle rank ranges from about 14,000 to about 96,000.

This is for the recently released book, though. Maybe more data would make a clearer picture? In a word, no. In a thousand words (i.e., one picture):

i-5641353b5e77d84ab9804bb6e0d2d4a5-book1vskindle1.PNG

That's the same plot for How to Teach Physics to Your Dog, spanning almost two full years, and you don't see any clearer correlation than in the smaller recent dataset.

How about changes in the sales ranks, though? After all, that's what triggered this-- a big drop in one at the same time as a big rise in the other. Is there any correlation between changes in the sales rank? When one goes up, does the other tend to go up, too, or do they jump around randomly?

We can check this, too:

i-003375242674bfa35bc17d206c23365f-bookvskindleslope.PNG

This graph is the normalized change in ranking-- that is, the difference from one hour to the next, divided by the ranking in the later hour. The normalization is to keep the numbers from being dominated by the high-number rankings, where a big absolute change in the value doesn't amount to much of a change in the rank. From this, you can see, well, that there isn't much of a relationship at all. The data points are clustered along the axes, indicating that most of the time, the changes are small, with big changes in one ranking not being matched by big changes in the other.

So, what do paper books and Kindle books have in common, in terms of sales ranks? Not much of anything, really. Which might seem like a disappointing result, but that misses the really important point, here: playing around with these graphs has allowed me to avoid grading papers for another hour, and that's always a good thing...

More like this

Amazon page rank is some sort of weighted moving average (plus some other cruft), so each series has a huge autocorrelation. This is why the scatterplots seem to form tracks, leading to the "crayon scribblings" appearance. To get anything useful from those graphs, I think you need to somehow account for the autocorrelation (which is not something I know anything about, aside from what I've learned from climate science blogs).

I think the fractional change is more promising, but there are still a couple of problems from time scales. One has to do with using hourly data--if you were selling much less than one book per hour, the hourly change is going to look largely uncorrelated because your coincidence window is too small. Also, the dynamics of kindle vs. paper purchases are different, due to differences in commerce friction and satisfaction times. For graphing purposes, you might try integrating the rank change over a long enough period to have decent statistics. Even better (but getting perilously close to actual work) would be a correlation fit with variable time offset between the series and a coincidence window set by the sales volume.

By Dan Riley (not verified) on 20 Mar 2012 #permalink

I honestly thought that first plot would turn out to be a photo of your dog.