Correlation is not causation: what came first - high Impact Factor or high price?

Bill decided to take a look:

Fooling around with numbers:

Interesting, no? If the primary measure of a journal's value is its impact -- pretty layouts and a good Employment section and so on being presumably secondary -- and if the Impact Factor is a measure of impact, and if publishers are making a good faith effort to offer value for money -- then why is there no apparent relationship between IF and journal prices? After all, publishers tout the Impact Factors of their offerings whenever they're asked to justify their prices or the latest round of increases in same.

There's even some evidence from the same dataset that Impact Factors do influence journal pricing, at least in a "we can charge more if we have one" kinda way. Comparing the prices of journals with or without IFs indicates that, within this Elsevier/Life Sciences set, journals with IFs are higher priced and less variable in price:

Fooling around with numbers, part 2:

The relationship here is still weak, but noticeably stronger than for the other two comparisons -- particularly once we eliminate the Nature outlier (see inset). I've seen papers describing 0.4 as "strong correlation", but I think for most purposes that's wishful thinking on the part of the authors. I do wish I knew enough about statistics to be able to say definitively whether this correlation is significantly greater than those in the first two figures. (Yes yes, I could look it up. The word you want is "lazy", OK?) Even if the difference is significant, and even if we are lenient and describe the correlation between IF and online use as "moderate", I would argue that it's a rich-get-richer effect in action rather than any evidence of quality or value. Higher-IF journals have better name recognition, and researchers tend to pull papers out of their "to-read" pile more often if they know the journal, so when it comes time to write up results those are the papers that get cited. Just for fun, here's the same graph with some of the most-used journals identified by name:

More like this

Important to add - the Part 3:

The curve fits are for the whole of each dataset, even though it's a zoomed view; the Nature set excludes British Journal of Pharmacology, the only NPG title that recorded 0 uses, and Nature itself. Colour coding by publisher is the same for each figure in this post. As in part 2, the correlation between price and use is weak at best and doesn't change much from publisher to publisher. Also, each publisher subset shows a stronger correlation than the entire pooled set -- score another one for Bob O'Hara's suggestion that finer-grained analyses of this kind of data are likely to produce more robust results. Since cutoffs improved the apparent correlation for the pooled set, I tried that with the publisher subsets: