Now on ScienceBlogs: Oldest Human-Made Object in Space

ScienceBlogs Book Club: Inside the Outbreaks

Uncertain Principles

Thoughts on physics, politics, and pop culture, by a physics professor at a small liberal arts college, plus occasional conversations with his dog.

Search

Profile

sidebar_relativity_cover.jpg

sm_cover_draft_atom.jpgYou've read the blog, now try the books! How to Teach Physics to Your Dog is published by Scribner, and available wherever books are sold. How to Teach Relativity to Your Dog is published by Basic Books and will be available 2/28/2012, as foretold by the Maya.

"Uncertain Principles" features the miscellaneous ramblings of a physicist at a small liberal arts college. Physics, politics, pop culture, and occasional conversations with his dog.

Chad Orzel "Prof. Orzel gives the impression of an everyday guy who just happens to have a vast but hidden knowledge of physics." (anonymous student evaluation comment)

Emmy, the Queen of Niskayuna Emmy is a German Shepherd mix, and the Queen of Niskayuna. She likes treats, walks, chasing bunnies, and quantum physics.

Research Blogging Awards 2010 Winner!

Donors Choose challenge link

Recent Posts

Recent Comments

Greatest Hits

Chateau Steelypips

Blogroll

Scientists

Academics

Interesting People

Books

Punditry

Categories

Archives

« How to Teach Physics to Your Dog: Obsessive Update | Main | Links for 2009-12-29 »

Quantization of Books 3: How Many Books Is That?

Category: Book WritingBooksMathPhysicsPhysics BooksPhysics with EmmyPop CulturePublicityTechnology
Posted on: December 28, 2009 10:32 AM, by Chad Orzel

When I saw the data generated by the sales rank tracker Matthew Beckler was kind enough to put together, I joked that I hoped to someday need a logarithmic scale to display the sales rank history of How to Teach Physics to Your Dog. Thanks to links from Boing Boing, John Scalzi, and Kevin Drum, I got my wish:

log_scale.jpg

For those not familiar with the concept, a log scale plots values on a scale that represents each order of magnitude as a fixed distance. So, the top horizontal line on that plot represents a sales rank of a million, the line below that a hundred thousand, the line below that ten thousand, and so on. This tends to blow up the detail at smaller ranges, allowing you to see more of the variation. On a linear scale, everything after the big downward spike at about 260 hours is just flat. Zooming in just a little, it still looks like this:

lin_scale_bb.jpg

There's still a good deal of variation in the flat bit of that graph, from a minimum value of 396 to a maximum of just over 2500 (as of 8pm Eastern Sunday night), but it's hard to see just what's going on without losing the higher points of the data.

This is all very nice, but of course, the whole point of having this data is to try to extract information that you wouldn't be able to get otherwise. So, can we figure out from this plot how many books were sold in this interval?

If you recall my previous excursion into number-crunching of these data, you'll remember that I made a plot of the (downward) change in sales rank as a function of the starting sales rank. This turned out to be remarkably linear, corresponding to a model where a single sale at a lower rank produces less of a change in that rank than a single sale at a higher rank. In other words, if you start at a ranking of 100,000, selling one book leaps you past a large number of other books, while if you start at a ranking of 1,000, a single sale doesn't make as much difference.

Doing the same thing with the larger dataset yields the following plot (I've deleted a few oddball points where the rank changed by only a few places in the 70,000 range):

lin_fits.jpg

The blue points are data from before the publication and the big sales boost from Boing Boing/ Whatever, the red are points from after that. You can see that they clearly don't all fall on the same line. The two solid lines represent straight lines fit to the two data sets, and you can easily figure out which equation goes with which. It should be noted that while on this scale, the red points sort of look like they fit a line, if you zoom in, they really don't:

post_bb_zoom.jpg

I suppose you could fit a line to that, if you were an economist or an astronomer, but I'm not going to waste anybody's time with that.

so, using this linear model, what does the big downward spike correspond to? Well, using the fit parameters from the plot above would suggest that the large jump Tuesday morning was 5.4 times bigger than the model would predict, suggesting that it represents the sale of 5-6 books.

That's nice, and all, but the problem is that the next spike down, according to the model, represents the sale of -1.4 books. That's because the fit above has a non-zero intercept, meaning that it predicts a ranking change of zero for a single sale at a rank of around 14,000, and below that level, the ranking change in negative. That's clearly wrong-- if 1.4 people returned their copies, my sales ranking would not get better.

So, how could we improve this? Well, logic dictates that a sales rank of 1 can't get any higher, so we could impose a model where the ranking change is 0 for a sales rank of 1. If we do that, the pre-publication data look like this:

alt_fits.jpg

I've done two different fits to this, one a linear fit constrained to go through the origin, the other a power law fit, just to have something with a bit of upward curve to it. Using the simple linear model, the big downward jump corresponds to about 2 books, and using the power-law fit, it's 4 books. Interestingly, the power-law fit gives higher values for some of the later downward jumps, with a peak of 13 for the jump from 1106 to 683 a few hours after the initial spike.

So, how many books does all this represent? Well, summing up all the changes from the power-law model gives 154 books. The same summing for the simple linear fit with zero intercept gives just 11-- the vast majority of the points after the spike correspond to less than one book's worth of that model's prediction. A third fit, using a second-order polynomial (which had a slightly better R2 than either of the others) predicts around 30 books.

None of these models are particular good, though-- for one thing, the fits aren't great. And there's no particular justification for the use of a power law or a parabola-- they're just easy functions to work with, mathematically.

In the end, the best I can say is that, over the whole data period, there are just about 100 points where the sales rank improved from one hour to the next. If you take the incredibly naive picture that each of those improvements represents at least one sale, that gives a lower bound of about 100 books sold. That's more or less consistent with other peoples' analyses of what sales rank means in terms of sales.

Which of these figures is right? I have no idea, and no way to determine the answer. I won't get any kind of real sales numbers for at least six months, maybe a year (unless somebody at Scribner is feeling generous, and wants to send me numbers). What I eventually get won't be nearly fine-grained enough to determine the number of sales via Amazon in the first week after publication, either.

But, hey, playing with numbers is fun...

Share on Facebook
Share on StumbleUpon
Share on Facebook

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/127981

Comments

1

Your guess is probably a little low--I'd be willing to bet more like 2-400. But my personal experience with Amazon sales ranks is a little stale--several years old. But when I worked "in-house" as it were for New York publishers, I could actually see how much Amazon was ordering for any given title (and this was before Bookscan, so it wasn't always clear what actual point-of-sales were). But since the number represents a relative rate of sales, it's difficult to translate it to absolute numbers.

Posted by: Moopheus | December 28, 2009 2:12 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.