Christina's LIS Rant

Types of recommender systems

I’m still on this kick on recommender systems. I’m further encouraged by happening on a report on “discoverability” by the Minnesota librarians when looking for something else on JR’s blog. The report agrees that recommender systems are a more important trend.

In standard information retrieval stuff, you’re going from whatever query the person puts in (which can be very, very different from their information need see Taylor [1]) and you’re computing similarity between those terms and what ever representation you have of information in the system. Smarter systems do a lot more than that, but that’s the start. Some of the basic recommender stuff just takes some way of describing the document on your screen and computes similarity between that and the documents in the system.

I remembered last time I was on this sort of kick, about how I wanted to use people’s citation manager as a source. But that’s different, right? Because that’s matching someone’s ongoing research interest over time. The systems I’m thinking about right now are really related to a specific task – without really knowing much about where this task fits in. I’m sure they don’t have to be that way.

Ran across a “taxonomy” of recommender systems on the internet [2]. It’s not the best article on the planet, but it offers some useful dimensions.

They say basically that it’s a matter of creating a user profile and then matching new items to it. For the user profile, they describe 5 dimensions:

  1. the profile representation technique,
  2. the technique used to generate the initial profile,
  3. the source of the relevance feedback which represents the user interests,
  4. the profile learning technique and
  5. the profile adaptation technique (p287-8)

This article obviously has a broader view. The other dimensions deal with the exploitation of the profile:

  1. the information filtering method (demographic, content-based and collaborative),
  2. user profile-item matching technique (when content-based)
  3. and the user profile matching techniques (when collaborative)
  4. profile adaptation technique (p290)

The three types of filtering are defined as follows:

  • demographic matches people (you’re a lot like Sally, we’ll suggest to you what she purchased)
  • content based
  • collaborative uses explicit user feedback on items

The article’s worth browsing because it goes into some more detail and gives a lot of examples. It also discusses some limitations.

I’m already losing enthusiasm for the topic, so I’ll just make some brief points from other stuff I browsed:

  • Usage data might perform better than some content based methods if you have enough of it [3]
  • Usage data probably performs better than citation data in the first couple of years after publication (when you will have more of it) and over a larger proportion of articles [4]
  • I haven’t seen where and when is the optimal point for surfacing recommendations when they’re about scholarly articles
  • Although several papers mention context and hybrid models, I don’t think any of these are actually being used by any of the systems our vendors are using. Maybe for them, good enough is good enough?

 

[1] Taylor, R.S.(1968) Question-Negotiation and Information Seeking in Libraries. College & Research Libraries 29(3), 178-94.

[2] Montaner, M., Lopez, B., & De La Rosa, J. L. (2003). A taxonomy of recommender agents on the internet. Artificial Intelligence Review, 19(4), 285-330.

[3] Nelson, M. L., Bollen, J., Calhoun, J. R.,&Mackey, C. E. (2004). User evaluation of the NASA technical report server recommendation service. In Proceedings of the 6th Annual ACM international Workshop on Web information and Data Management (Washington DC, USA, November 12 – 13, 2004). WIDM ’04. 144-151. DOI: 10.1145/1031453.1031480

[4] Pohl, S., Radlinski, F., and Joachims, T. (2007). Recommending related papers based on digital library access records. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (Vancouver, BC, Canada, June 18 – 23, 2007). JCDL ’07. 417-418. DOI: 10.1145/1255175.1255260