Michael J. Kurtz of the Harvard-Smithsonian Center for Astrophysics came to speak at MPOW at a gathering of librarians from across the larger institution (MPOW is a research lab affiliated with a large private institution). He's an astronomer but more recently he's been publishing in bibliometrics quite a bit using data from the ADS. You can review his publications using this search.
As an aside, folks outside of astro and planetary sciences might not be familiar with ADS, but it's an excellent and incredibly powerful research database. Sometimes librarians turn their nose up at it because it's all about being functional and not at all about being pretty, but it essentially rocks (I'm definitely going to have to do a post on freely available research databases besides PubMed).
Kurtz' talk was basically at the speed of light and broken down into two parts: bibliometrics using usage data as compared to citations in astro with ads data and then more on scholarly communication.
I only have hand-written notes so let me just try to capture some of his points in bullets:
- like Amazon, successful recommender systems use usage data
- not new, Derek J deSolla Price graphed the obsolescence curve for articles (not cited, then get the most citations, then trails off, eventually flat with few citations after some period of time that depends on the subject)
- in an article he mapped the usage vs. age. He showed us graphs of 110 years, a few years (?), and then 90 days. (maybe doi: 10.1002/asi.20096 ). This can be modeled using exponentials with 4 different time scales.
- he showed the different usage - age graphs for traffic coming directly to ADS (presumably mostly professional scientists - this looked just like Price's model and the citation model), for people coming from Google Scholar(they take to be students), and from people coming from google (flat across all years, taken to be random members of the public).
- astronomers read and cite the same things so you can use usage instead of citations to look at individuals, institutions, countries
- the MESUR project - gathering usage data from a pile of places. Problem is the quality of the data available - doesn't follow a user through what looked at, what linked to.
- ADS has a popular items algorithm: put in a search - it matches, people who have read also read, ranks those by # of usages
- should use citedness for tenure decisions - very unstable at about 7-10 years where as usage data is pretty stable
- usage is better at measuring journals than citedness. example: medicine - clinicians read a lot more articles but don't write so much (if at all).
- page rank gets it right, IF gets it wrong (I think this was mapping various things like usage, citation, impact factor... on some big graph...)
So that's the notes I have from the first section - here's the second.
- ADS has semantic links between scholarly papers, the observations they' are based on, and other sources of data for that astronomical object (this is actually wicked cool)
- ADS also links to ArXiv and has openurl linking so you can find a copy your institution subscribes to (I had them list our parent institution, but you have to set up your own preferences to turn it on, they don't register IPs with institutions)
- it's a hodge podge now, but they're working on a virtual observatory that will make this more seamless
- elsewhere - he mentioned provenance (briefly - I saw more at the IEEE eScience conference) and the value of sharing workflows (like myExperiment) - and VisTrails
- he ended with an exhortation to support Open Access (this crowd already does - well at least the STEM folks)
I needed about 5 more minutes with each slide, but it was still a great talk. I'll have to go back and read/re-read his articles after this comps thing is over. BTW- if you're reading Michael - I'm waving and thanks for coming!