Reviewing recommender systems for scholars

By cpikas on March 24, 2010.

I'm on a sub-sub committee to evaluate evaluation of consideration of adding a new recommender system to our discovery tools across my parent institution's libraries. The system costs money and programmer time (which we're very short on), but more importantly, there's a real estate issue, we already offer some similar tools, and even if the recommendations are perfect we don't know if or where we could/should surface them, they'd be noticed and used. I'm trying to get my arms around at least the questions we should ask or things we should consider. I'm using this post to work through some ideas.

In information retrieval in general, you model what the user needs (as actually specified to the system) and you model the things in the information system. For recommender systems - not human recommenders - you mostly specify the need by example. Find others like object x or find others that would address the same information need as object x. For modeling the information objects you look at ways to describe them. This could be using subject tags - from a controlled vocabulary applied by human or machine indexers, uncontrolled terms, or extracted from the text itself. You could make that into a vector, and then you can use various similarity measures like Pearson, Jaccard, or cosine to find similar objects [1]. I think this is probably what ScienceDirect does with their recommendations - they use the content to find similar articles.

You could also look at other things that describe an object - its creators, its publication venue, its citations, and who cites it (these are all also pieces people look at to judge relevance. Co-citation coupling is when two articles are both cited by a third. Bibliographic coupling is when two articles cite the same other articles (both of these defined briefly here). Web of Science shows you related articles by the number of citations they share (bib coupling). Some libraries already use their api to add this data to other services (see Jonathan Rochkind's discussion). Sage ejournals give you a link to look the article up in Google Scholar to see what cites it. Many research databases and ejournal platforms let you either click on the author name or somewhere on the margin to see other things written by the author.

There are other ways to do this in ecommerce systems. People who bought A also bought B, for example. Amazon's gotten smart with this by allowing you to specify that you bought A for a gift so you might not like more like A for yourself. More recently, there have been a few suggestions of doing it this way in libraries. People who checked out A also checked out B. Of course, that creeps people out because we keep checkout records private. So what if you are able to aggregate downloads over a ton of people so it's less creepy and actually makes more sense? That's what Van de Sompel and Bollen suggested [2-3] and what Ex Libris is offering in their BX product. There is an assumption here: two articles requested in full text within the same session are desired to fill the same information need.

Of course, instead of a recommender system, you could just facilitate and track user recommendations. Process mentions on blogs, friendfeed, twitter, etc., and pipe them back in. Some platforms are starting to do this with ResearchBlogging info.

Most of the big questions are still outstanding - which type of recommendations actually perform best in practice with the group of users expected to use the system? Where in the process should these recommendations appear and how? Can usage from an open URL resolver help people in disciplines that are book or conference paper heavy? (our open url resolver is fine for books because it searches our catalog - others aren't. It still pretty much sucks for conference papers, unfortunately) If not, could you add a book recommender, too?

If I get a chance, I'll poke around the literature to see if some of these things got answered. I'm curious what other recommender systems libraries are incorporating into their discovery services.

[1] van Eck, N. J., & Waltman, L. (2009). How to Normalize Cooccurrence Data? An Analysis of Some Well-Known Similarity Measures. Journal of the American Society for Information Science and Technology, 60(8), 1635-1651. doi:10.1002/asi.21075

[2] Bollen, J., Van de Sompel, H. (2006) An architecture for the aggregation and analysis of scholarly usage data. Retrieved from http://public.lanl.gov/herbertv/papers/jcdl06_accepted_version.pdf

[3] Bollen, J., Van de Sompel, H., Smith, J.A., & Luce, R. (2005) Toward alternative metrics of journal impact: a comparison of download and citation data. Information processing & Management 41,1419-1440. doi: 10.1016/j.ipm.2005.03.024

More like this

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Glyphosate reduces soil biodiversity and decreases the proportion of native species (French)

More by this author

Yeah, me too.

August 2, 2010

I'm also leaving ScienceBlogs, but it's not for the reasons some others have given. I don't think Pepsi's blog will hurt my real life reputation and besides, it's been pulled, there have been apologies - it's time to forgive. July was the first month I've gotten enough hits to get a paycheck - and…

Very cool - American Physical Society offers free access to public libraries

July 29, 2010

This APS rocks! Here's the press release from PAMnet: FOR IMMEDIATE RELEASE APS ONLINE JOURNALS AVAILABLE FREE IN U.S. PUBLIC LIBRARIES Ridge, NY, 28 July 2010: The American Physical Society (APS) announces a new public access initiative that will give readers and researchers in public libraries…

Michael Pater, Connecticut artist, died today

July 25, 2010

He was also my husband's uncle. I only found two of his images online, the remainder are photographs of prints we have on our walls - intentionally poor quality for those. He was a member of the Lyme Art Association, so there may be more information on their site. The Courant (Hartford, CT) had…

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

July 25, 2010

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific…

Well, sometimes you just have to Google it

July 21, 2010

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work…