Article downloads as a measure of …

quality? popularity? utility? I'm pretty sure I've blogged about MESUR (a research project that studied how usage statistics - as we call them in the industry - can be a metric like citations are). I've also blogged a discussion by MJ Kurtz in which he discusses how usage is very much like citations, if offset.  Some researchers including some bibliometricians have issues with using usage for some pretty good reasons:

  • if citations can be gamed then click fraud anyone?
  • if we don't know what citations mean, then what can we say about downloads at all?
  • what is actually counted? pdf downloads? html full text view?
  • how do you make absolutely sure you remove all bots and spiders - or are there some that you leave in?
  • accidental downloads (crap, wrong button!) - minor issue
  • people passing really good articles around so no downloads - not sure how big an issue this is

Kurtz makes a couple of good arguments for using usage (listed in the other post, but I'll reiterate):

  • tend to be more stable'
  • in fields like clinical medicine, there are a lot of people who use the literature and get value from it but don't write and therefore aren't contributing citations

Also, they lag less. They lag not at all, in fact. The second the thing has a stable URL, BAM!  (but, of course, do you add in downloads of the ArXiv version or the author's version or the author's institutional repository version?)

In practical terms, it's often quite difficult to get usage statistics - EVEN IF YOU ARE TRYING TO GET THEM FOR YOUR OWN INSTITUTION!!! - and once you get them, they're difficult to interpret and difficult to compare across.  Sure, there are tools that are supposed to help this and some standards but we're not quite there yet.

So I'm very excited to hear that PLOS is offering article download information. So no begging of your acquisitions folks, no looking at the "top downloads" listing to see if your article is there. You can get it right  at the article if you happen to get your article accepted into a PLOS journal. Oh, and even cooler, you can download the whole shebang in a spreadsheet! (Quick, find me a research question using this data!) You can also get how many times things are cited from a couple different sources.

If you're a scientist, this is also one way to filter for articles that are worthy of more attention when there are so many new articles coming out. (things will have more downloads if they have press releases, etc., but still).  Read more about the PLOS article level metrics here.

(note: if you read this like right now when I'm posting, PLOS journals are down for maintenance, but they'll be back soon)

More like this

If you are a regular reader of this blog, you are certainly aware that PLoS has started making article-level metrics available for all articles. Today, we added one of the most important sets of such metrics - the number of times the article was downloaded. Each article now has a new tab on the top…
I attended this one-day workshop in DC on Wednesday, December 16, 2009. These are stream of consciousness notes. Herbert Van de Sompel (LANL) - intro - Lots of metrics: some accepted in some areas and not others, some widely available on platforms in the information industry and others not. How are…
The gold standard for measuring the impact of a scientific paper is counting the number of other papers that cite that paper. However, due to the drawn-out nature of the scientific publication process, there is a lag of at least a year or so after a paper is published before citations to it even…
Michael J. Kurtz of the Harvard-Smithsonian Center for Astrophysics came to speak at MPOW at a gathering of librarians from across the larger institution (MPOW is a research lab affiliated with a large private institution).  He's an astronomer but more recently he's been publishing in bibliometrics…

If you ever started building download stats into a metric you actually used for hiring or such, then click fraud would instantly be important and the big compromised robot networks would have a new product. Someone who knows someone could just ask for 1043 (or 9875 or 89206) more downloads over the next week and bam it would be instantly scheduled to match current download times in a nice distribution from many IP's. They could with little effort use colleges compromised machines to do the asking. No way could you easily tell. If my job was on the line and they used something stupid like this to judge me and wouldn't listen when I explained how easy it would be to game, I could set this system up in a month or so.

The only way this would work is on a non-anonymous controlled access repository. That might be a good thing anyway. I don't really see anonymity needed for scientific papers. You could still have anonymous access, it just wouldn't be counted.