Article-Level Metrics at PLoS - Download Data (updated with links)

If you are a regular reader of this blog, you are certainly aware that PLoS has started making article-level metrics available for all articles.

Today, we added one of the most important sets of such metrics - the number of times the article was downloaded. Each article now has a new tab on the top, titled "Metrics". If you click on it, you will be able to see the numbers of HTML, XML and PDF downloads, a graph of downloads over time and a link to overall statistics for the field, the journal, and PLoS as a whole.

Mark Patterson explains (also here and here), what it all means:

We believe that article-level metrics represent an important development for scholarly publishing. While some publishers are providing limited data, we are not aware of any publisher that has gone as far as PLoS in providing such a broad range of indicators and metrics, and in making the data openly available. We invite you to visit our journal sites and seek out the Metrics tab for each article.

It's also important to emphasize that online usage should not be seen as an absolute indicator of quality for any given article, and such data must be interpreted with caution. To provide additional context and to aid interpretation, we have provided a series of summary tables indicating the average usage of categories of article (grouped by age, journal and topic area). Users will also notice that a number of articles do not have any usage data, because of problems with the log files. We are working hard to add data for these articles, and we also encourage readers to let us know if they find any anomalies or have any questions about the data.

You can also download the entire dataset for all seven PLoS Journals. If you click on this you can download a ZIP folder that contains an Excel file that lists all the metrics for all the articles. Please play with the data - let us know what you find, or blog about your impressions.

You can find more information (with lots of detail about the methodology, caveats, etc) about all article-level metrics on the pages of individual journals (here is the example for PLoS ONE) or specifically about download data here.

There is also a FAQ page and an entire new website devoted to explaining the new metrics. Take some time and explore. And experiment with the data and let the world know what you found.

As you may be aware, download statistics strongly correlate with subsequent citations. And just like researchers have perfected ways to game the citation numbers, of course the downloads can be gamed as well. Which is why we provide all the raw data and methods we used to get them. So you can figure out everything for yourself. It is all transparent.

The first reactions are in. Christina Pikas wrote on her blog:

So I'm very excited to hear that PLOS is offering article download information. So no begging of your acquisitions folks, no looking at the "top downloads" listing to see if your article is there. You can get it right at the article if you happen to get your article accepted into a PLOS journal. Oh, and even cooler, you can download the whole shebang in a spreadsheet! (Quick, find me a research question using this data!) You can also get how many times things are cited from a couple different sources.

If you're a scientist, this is also one way to filter for articles that are worthy of more attention when there are so many new articles coming out. (things will have more downloads if they have press releases, etc., but still). Read more about the PLOS article level metrics here.

One of PLoS authors, Jamie Sundsbak, says:

I have written before on how impact factors of journals and their publications are being rethought. I believe that by providing the data on how scientific publications are disseminated into the world, we will truly be able to judge an article on its individual scientific worth, not just on the authors' reputatons or where it is published.

And this from Karen Grepin:

As academic research is increasingly disseminated through new media channels, I think this is a very important improvement in measuring the impact of work. Old metrics, such as just the number of citations, may miss the importance of many academic publications. Here's to hoping that more journals begin to report such data.

Duncan Hull took a look at the some of the articles and said:

So all the usual caveats apply when using this bibliometric data. It is more revealing than the useful (but simplistic) "highly accesssed" papers at BioMedCentral, which gives little or no indication what "highly" actually means without the raw data to back it up. It will be interesting to see if other publishers now follow the lead of PLoS and also publish their usage data. For authors publishing with PLoS, this data has an added personal dimension too, it is interesting to see how many views your paper has.

As paying customers of commercial publishers, should scientists and their funders be demanding more of this kind of information in the future? I reckon they should.

And Andrew Farke wrote a longer post:

So what's to like here? Well, an author gets an immediate sense if someone is paying attention to a publication. Page views and PDF downloads are a valuable tool for gauging community interest. In concert with citation data, it's probably a far better gauge of a paper's worth than the impact fact[or] that the publication happens to show up in. The data are also freely available, transparent, and frequently updated. The latter is particularly important because it may be years before a paper's full impact is known. An open-access metric for an open-access world.

----------

I suspect that other journals will follow suit - it may not happen tomorrow, but it will happen. We may be seeing the death of the traditional, sometimes tyrannical, "impact factor." Let's hope we don't replace it with a new despot!

And on the Small Gray Matters blog:

You can now see the number of citations, blog mentions, article views and PDF downloads for every PLoS article, so if you've published in one of the PLoS journals, you're free to go find out just how many times your article's been viewed compared to everyone else's. And then you of course you're free to make up all sorts of convenient excuses for the fact that your stats suck and your paper's only been viewed seventeen times in the last three years.

What's even cooler is the PLoS editors have collated all of that information and released it as one monstrous Excel spreadsheet. So you can now run off and do a quick regression analysis to determine whether or not having longer paper titles leads to more citations, if you're so inclined.

Other mentions:

NeuroTechnica
Library Stuff
Greg Laden
UBC Library eResources: Service Bulletins
Pimm - Partial immortalization
The Tree of Life
LiLoLe

And check out A.J.Cann on his blog and on YouTube

The Public Library of Science (PLoS), announced the release of an expanded set of article-level metrics on its scientific and medical journal articles (some 14,000 articles across 7 journals). The article-level metrics program was launched in March 2009, and with this addition of online usage data, PLoS is providing an unprecedented set of information on every published article. Such information will be of value to researchers, readers, funders, administrators and anyone interested in the evaluation of scientific research. The PLoS article metrics include the new online usage data (HTML page views, PDF downloads and XML downloads), as well as citation counts, comments, ratings, social bookmarks and blog coverage. Usage data will be updated daily and currently include more than four years of statistics from all seven peer-reviewed PLoS journals. With this growing and detailed set of metrics on every article, PLoS aims to demonstrate that individual articles can be judged on their own merits rather than on the basis of the journal in which they are published. Because very few data have previously been made public by scholarly publishers, visitors to the journal sites will need help to understand these data. For example, it is clear from the PLoS data that online usage is dependent on the age of the article, as well its subject area. In order to place the new usage data in context, PLoS is therefore providing summary tables to allow users to see how an article compares with various average measures. For anyone wishing to examine the data in detail the complete raw data set is also available as a download. PLoS is still in the early stages of the article-level metrics program, but this is the first attempt by a major publisher to place such a broad range of data on each article. PLoS therefore hopes that the provision of these data will encourage other publishers to make such data available, which will lead ultimately to broader improvements in scholarly communication and research assessment.

drdrA over on Blue Lab Coats has this to say:

One of these tabs is the 'metrics' tab. If you click on it it takes you to a page that shows the metrics - things like article views, and downloads, for that particular article. Here is an example from that article that I posted on the other day. That article was just published, but you can also see metrics on older articles that were collected prior to the appearance of this feature-... like for this article for example. I love this feature because it reflects reality to the level of readership of a given article better and more immediately than the traditional pre-electronic media age measures such as citation rate or total citation number could. And, I've gotten quite used to looking at readership data in terms of hits and page views- running this blog and whatnot... so I've kind of got a feeling for this kind of data anyway.

And see that 'Related Content' tab up at the top there too. From that page you are set up to quickly search for related articles, bookmark things in CiteULike (which I need to become more savvy with), AND LOOK FOR RELATED BLOG POSTS!!! How awesome is that!!?? Now you are immediately connected to related scientific literature, and to the immediate response to a given article in the blogosphere, with all the commentary that brings with it.

Cameron Neylon (guest post on the PLoS blog):

Actual public interest in your paper? Real educational value being gained from it? These are things that you want to know about. It would be even better if we could separate these out, and I find the prospects of using download versus citation metrics in the future quite exciting. But in the meantime, it gives us a new measure that we can compare with what we already have available.

And, at the end of the day, that is how I think we should see these new metrics--it is more information. It isn't yet clear how best to use these measures, but it is up to us as scientists, who, after all, make our living out of measurement and analysis, to figure out how best to use them. The approach PLoS is taking of simply presenting the data, and as much of it as is possible, is to me exactly the right approach. It is not the responsibility of journals to tell us how to measure and report things--it is up to us.

In the end, there is only one way of determining whether a particular paper is important and relevant to you personally, and that is to read it, digest the information, critically analyze it, and come to your own conclusions. You can't avoid this and you shouldn't. Where download and other article-level metrics can help is in making that decision about how much time you want to invest in a given paper. We need better ways of making that decision, and more data can only help.

More like this

"Which is why we provide all the raw data [...]"

Seems to me that the raw data is not provided, only cleaned up and aggregated numbers. It's nice to have those numbers, but if you think some number looks suspect, there is nothing you can do to investigate yourself.

@Eric
That is true, the raw data would be the web server logs. Although the size of those files might be cumbersome, hopefully we can expect the released materials to keep increasing.