Christina's LIS Rant

I’m on a sub-sub committee to evaluate evaluation of consideration of adding a new recommender system to our discovery tools across my parent institution’s libraries. The system costs money and programmer time (which we’re very short on), but more importantly, there’s a real estate issue, we already offer some similar tools, and even if the recommendations are perfect we don’t know if or where we could/should surface them, they’d be noticed and used. I’m trying to get my arms around at least the questions we should ask or things we should consider. I’m using this post to work through some ideas.

In information retrieval in general, you model what the user needs (as actually specified to the system) and you model the things in the information system. For recommender systems – not human recommenders – you mostly specify the need by example. Find others like object x or find others that would address the same information need as object x. For modeling the information objects you look at ways to describe them. This could be using subject tags – from a controlled vocabulary applied by human or machine indexers, uncontrolled terms, or extracted from the text itself. You could make that into a vector, and then you can use various similarity measures like Pearson, Jaccard, or cosine to find similar objects [1]. I think this is probably what ScienceDirect does with their recommendations – they use the content to find similar articles.

You could also look at other things that describe an object – its creators, its publication venue, its citations, and who cites it (these are all also pieces people look at to judge relevance. Co-citation coupling is when two articles are both cited by a third. Bibliographic coupling is when two articles cite the same other articles (both of these defined briefly here). Web of Science shows you related articles by the number of citations they share (bib coupling). Some libraries already use their api to add this data to other services (see Jonathan Rochkind’s discussion). Sage ejournals give you a link to look the article up in Google Scholar to see what cites it. Many research databases and ejournal platforms let you either click on the author name or somewhere on the margin to see other things written by the author.

There are other ways to do this in ecommerce systems. People who bought A also bought B, for example. Amazon’s gotten smart with this by allowing you to specify that you bought A for a gift so you might not like more like A for yourself.  More recently, there have been a few suggestions of doing it this way in libraries. People who checked out A also checked out B. Of course, that creeps people out because we keep checkout records private.  So what if you are able to aggregate downloads over a ton of people so it’s less creepy and actually makes more sense?  That’s what Van de Sompel and Bollen suggested [2-3] and what Ex Libris is offering in their BX product. There is an assumption here: two articles requested in full text within the same session are desired to fill the same information need.

Of course, instead of a recommender system, you could just facilitate and track user recommendations. Process mentions on blogs, friendfeed, twitter, etc., and pipe them back in.  Some platforms are starting to do this with ResearchBlogging info.

Most of the big questions are still outstanding – which type of recommendations actually perform best in practice with the group of users expected to use the system?  Where in the process should these recommendations appear and how? Can usage from an open URL resolver help people in disciplines that are book or conference paper heavy? (our open url resolver is fine for books because it searches our catalog – others aren’t. It still pretty much sucks for conference papers, unfortunately) If not, could you add a book recommender, too?

If I get a chance, I’ll poke around the literature to see if some of these things got answered. I’m curious what other recommender systems libraries are incorporating into their discovery services.

[1] van Eck, N. J., & Waltman, L. (2009). How to Normalize Cooccurrence Data? An Analysis of Some Well-Known Similarity Measures. Journal of the American Society for Information Science and Technology, 60(8), 1635-1651. doi:10.1002/asi.21075

[2] Bollen, J., Van de Sompel, H. (2006) An architecture for the aggregation and analysis of scholarly usage data. Retrieved from http://public.lanl.gov/herbertv/papers/jcdl06_accepted_version.pdf

[3] Bollen, J., Van de Sompel, H., Smith, J.A., & Luce, R. (2005) Toward alternative metrics of journal impact: a comparison of download and citation data. Information processing & Management 41,1419-1440. doi: 10.1016/j.ipm.2005.03.024