Article downloads as a measure of …

quality? popularity? utility? I'm pretty sure I've blogged about MESUR (a research project that studied how usage statistics - as we call them in the industry - can be a metric like citations are). I've also blogged a discussion by MJ Kurtz in which he discusses how usage is very much like citations, if offset.  Some researchers including some bibliometricians have issues with using usage for some pretty good reasons:

  • if citations can be gamed then click fraud anyone?
  • if we don't know what citations mean, then what can we say about downloads at all?
  • what is actually counted? pdf downloads? html full text view?
  • how do you make absolutely sure you remove all bots and spiders - or are there some that you leave in?
  • accidental downloads (crap, wrong button!) - minor issue
  • people passing really good articles around so no downloads - not sure how big an issue this is

Kurtz makes a couple of good arguments for using usage (listed in the other post, but I'll reiterate):

  • tend to be more stable'
  • in fields like clinical medicine, there are a lot of people who use the literature and get value from it but don't write and therefore aren't contributing citations

Also, they lag less. They lag not at all, in fact. The second the thing has a stable URL, BAM!  (but, of course, do you add in downloads of the ArXiv version or the author's version or the author's institutional repository version?)

In practical terms, it's often quite difficult to get usage statistics - EVEN IF YOU ARE TRYING TO GET THEM FOR YOUR OWN INSTITUTION!!! - and once you get them, they're difficult to interpret and difficult to compare across.  Sure, there are tools that are supposed to help this and some standards but we're not quite there yet.

So I'm very excited to hear that PLOS is offering article download information. So no begging of your acquisitions folks, no looking at the "top downloads" listing to see if your article is there. You can get it right  at the article if you happen to get your article accepted into a PLOS journal. Oh, and even cooler, you can download the whole shebang in a spreadsheet! (Quick, find me a research question using this data!) You can also get how many times things are cited from a couple different sources.

If you're a scientist, this is also one way to filter for articles that are worthy of more attention when there are so many new articles coming out. (things will have more downloads if they have press releases, etc., but still).  Read more about the PLOS article level metrics here.

(note: if you read this like right now when I'm posting, PLOS journals are down for maintenance, but they'll be back soon)

More like this

If you are a regular reader of this blog, you are certainly aware that PLoS has started making article-level metrics available for all articles. Today, we added one of the most important sets of such metrics - the number of times the article was downloaded. Each article now has a new tab on the top…
Interesting conversation at lunch today: topic was academic performance metrics and of course the dreaded citation index came up, with all its variants, flaws and systematics. However, my attention was drawn to a citation metric which, on brief analysis, and testing, seems to be annoyingly reliable…
Everyone and their grandmother knows that Impact Factor is a crude, unreliable and just wrong metric to use in evaluating individuals for career-making (or career-breaking) purposes. Yet, so many institutions (or rather, their bureaucrats - scientists would abandon it if their bosses would) cling…
I'm doing a presentation at this week's Ontario Library Association Super Conference on a case study of my Canadian War on Science work from an altmetrics perspective. In other words, looking at non-traditional ways of evaluating the scholarly and "real world" impact of a piece of research. Of…

If you ever started building download stats into a metric you actually used for hiring or such, then click fraud would instantly be important and the big compromised robot networks would have a new product. Someone who knows someone could just ask for 1043 (or 9875 or 89206) more downloads over the next week and bam it would be instantly scheduled to match current download times in a nice distribution from many IP's. They could with little effort use colleges compromised machines to do the asking. No way could you easily tell. If my job was on the line and they used something stupid like this to judge me and wouldn't listen when I explained how easy it would be to game, I could set this system up in a month or so.

The only way this would work is on a non-anonymous controlled access repository. That might be a good thing anyway. I don't really see anonymity needed for scientific papers. You could still have anonymous access, it just wouldn't be counted.