Netflix

The Netflix Prize will soon be over: it sounds as if the team "Bellkor Pragmatic Chaos" will be granted the million dollar prize, awarded for improving Netflix's own algorithm by more than 10 percent. As a heavy Netflix user, I certainly appreciate the design of the website, which does a masterful job of framing my DVD options. Although Netflix has hundreds of thousands of DVD's, I rarely feel overwhelmed by the abundance, since I'm constantly being bombarded with suggestions. Did I just add Season 6 of the Sopranos to my queue? Perhaps I should try the Shield, since I also liked The Wire? Did I enjoy The Class? Then maybe it's time to re-watch Chinatown? In other words, my choices are intelligently framed - the constraints make me feel free, or at least less burdened by the problem of excessive choice.

That said, I find the actual Netflix Cinematch algorithm - the software tries to predict how I'll feel about movies based on my past ratings - to be utterly useless. And it's not just for quirky movies like Napoleon Dynamite, or Syndecdoche, New York, or everything by Wes Anderson. Instead, I find that those suggested stars rarely capture my actual preferences. Of course, I'm sure the fault is mostly mine - I rarely take the time to review my selections, and when I do I'm pretty careless. I haven't devoted nearly enough time to figuring out the difference between four stars and five stars, or found a way to merge my ratings with those of my wife. (I enjoyed Synecdoche; she thought it was garbled and pretentious.)

But I think there's a deeper problem with these newfangled preference algorithms, and it has nothing to do with the details of their programming code. Instead, I think they're making a fundamental psychological mistake: all of these algorithms assume that our preferences are stable and consistent, but that's clearly not the case. In other words, Netflix assumes that if I like Napoleon Dynamite on Saturday night then I'll also enjoy it on Sunday afternoon. It assumes that I'll find Pineapple Express funny when I'm watching it with a bunch of stoned friends and when I'm watching it sober and alone, on a weekday evening. It also assumes that I'll want to watch the same list of movies regardless of when I'll be watching them.

I think my own Netflix habits demonstrate the fallacy of such assumptions. For instance, I'm constantly adding all sorts of highbrow fare to my queue. Why? Because I like to think of myself as someone who wants to watch the complete Criterion collection, from Bergman to De Sica. I know I should watch those classic Chaplin movies, so I add them to the list too. And then you know what happens? Those pretentious red envelopes sit on top of my DVD player for weeks at a time, while I enjoy all sorts of middlebrow crap. The moral is that I assume my future self has more refined taste than my actual self - there is a jarring inconsistency between my stated preferences and my actual desires.

There's a perfectly good neurological explanation for this phenomenon, which is that decisions involving the distant future tend to activate circuits in the prefrontal areas. The end result is that we think more deliberately and dispassionately, and choose movies that seem intelligent, classy, etc., even if they involve subtitles and tedious plots. Of course, when it comes to choosing which of our Netflix options we actually want to watch tonight, the decision is shifted to a different brain system, and seems to activate a more emotional and impulsive set of cortical areas. This leads, of course, to a different set of preferences: I'm less interested in early Fellini and more likely to opt for Superbad. I want some laughs and pleasure, and I want them right now.

Needless to say, this sort of temporal inconsistency is only one of the many different ways our preferences prove inconsistent. Where we watch the movie, who we're watching it with, and the mood we bring to the couch are often just as important as the actual stuff on the screen. So this is why I don't think a new and improved Netflix algorithm will suddenly solve all my movie dilemmas. The software, you see, is founded upon a simple mistake: it assumes that I know what I like, and that I always like what I like, but that's rarely the case.

More like this

Kate's going to Readercon next weekend, and I'm not. I have three summer students at the moment, and some other projects that I need to work on, and I just can't spare the time. This means I'm going to have a whole weekend to do things that Kate doesn't enjoy, like go out for sushi and greasy…
I've been Netflix-ing and sloooowly watching the anime Last Exile over the last few months, and finished it over the weekend. It's all very pretty, but I really don't understand what the hell happened at the very end. Some fun stuff along the way, though. This means that I have once again run out…
Kate and I have a Netflix subscription that we've mostly been using to obtain various anime series. We're running a little low on Japanese cartoons, though, having recently finished Martian Successor Nadesico, and with only four discs left of Trigun (two of which will probably be polished off while…
Here's an interesting article about the wisdom of crowds. It starts by discussing the surprising accuracy of Wikipedia. The reason that Wikipedia is as good as it is (and the reason that living organisms are as sophisticated as they are), is not due to the average quality of the edits (or mutations…

My strategy is to put very few items on the queue. Then the chance of my mood when they arrive being similar to my mood when I picked them is much higher.

This article got me thinking about Pandora.com (a free music-sharing and recommendation site) and how it potentially makes better recommendations to its users because it has more direct data (thumbs up/down, song skips, duration between station changes, etc.) on which to base the recommendations.

Thinking about Pandora.com got me thinking about DI.FM, a free music-broadcasting website. The electronic music (house, electro, progressive, trance, ambient, EDM, breaks, etc.) on this website is absolutely PHENOMENAL (and I am an extremely harsh critic, especially when it's music related). Checking it out!!!

By Thomas Schroeder (not verified) on 20 Aug 2009 #permalink

So is the solution to stop thinking of your star ratings as "how much you liked it", and start thinking of it as "how much you would like to see another one, similar to it, in the future"?

I think that the other problem with the algorithm, especially when it concerns film, is that a five-star rating system doesn't tell WHY you like the film.

Say, for example, I rate "High Fidelity," "Tropic Thunder," "Shallow Hal," and the recent remake of "King Kong" as four- or five-star picks. (I wouldn't rate these movies that highly, but still...) The algorithm will probably start recommending "Nacho Libre" "School of Rock" and other Jack Black movies.

But I fucking HATE Jack Black. It just so happens that I like those movies for reasons that have nothing to do with Jack Black, or with each other.

If I could somehow explain the "whys" to the system, it might come up with accurate recommendations. Absent that, I don't see it making much of a difference. That's the problem that I had with the Netflix (iTunes and Amazon, for that matter) recommendations. A lot of times I knew why it chose what it chose; it was just using criteria that I don't always agree with or find valid.

By Woody Tanaka (not verified) on 20 Aug 2009 #permalink

If you go into your Netflix account settings you will find that you have the option of setting up multiple profiles on a single account. Each profile gets its own queue, and matching and suggestions are handled separately for each user. My sister and I do this to share our account and it works great.

On the star system, my own personal rule is as follows:
1 star - I hated this movie so much I couldn't even finish it.
2 stars - I finished it, but I didn't really like it.
3 stars - This movie was fine, but I don't think I'd want to see it again.
4 stars - I would see this movie again.
5 stars - This movie was so wonderful I need to own it.
I still sometimes wish for a half-star option though.

I find that I use the instant streaming option much more often and much more effectively than the actual red envelope DVD. There is less pressure to predict what mood you may be in to queue up what you might watch in a day or two.

If the algorithm could somehow track what your instantly watching in the streaming option and how long you watched it I bet it could track and predict a real time mood shift better than the ratings based on what is in your DVD queue and the self decided rating of watched movies.

I LOVED Synecdoche, NY after watching it for the first time recently and gave it 5 stars. But if I have to see the stupid cover of Eternal Sunshine of the Spotless Mind pop up every time I check out my suggestions I'm going to scream. Yes, Netflix I liked it, I rated it well years ago - I don't want to see it suggested again every week.

I used to work for a company that used to be in the entertainment recommendation business. They had very lofty and quite wonderful ideas and models for how to recommend content like movies. They called the model ABCM. When it worked, one of the coolest things about it was that it was very good at inferring the "why" of your likes. So for the commenter above that mentioned the case of liking a couple of movies with Jack Black but hating the actor himself, this model was good at picking up the distinguishing characteristics there.

Some of the cool ideas they had were focused around solving the problem that the blog author mentioned: being able to recommend something that you are in the mood for now as opposed to context-less recommendations.

I really wish they could have found a way to solve the problems that they ran into with the engineering and business model rather than just switching to targeting advertisements, but I guess one makes average people happy and the other makes rich people happy. ::shrug::

I don't see how this is a fundamental problem of the Netflix challenge. The goal of the challenge is to accurately predict your ratings. Nothing says that there needs to be an underlying assumption that your preferences are fixed through time -- in fact, part of the data that they give is the date and time of the rating. Entrants are also free to include an assumption in their model that multiple people (literally or not literally) are using the same account. If it helps make better predictions to assume multiple people or personalities per user, then the programmers will discover this and include it in their models.

I'm not saying that the Netflix challenge is perfect -- and there are plenty of things to dislike about the problem formulation -- but the issues you raise seem to be well-addressed by the Netflix challenge problem formulation.

I don't mean to be rude, but this criticism is rather naive. No one makes this assumption, and the science of collaborative filtering is mature enough to have considered these issues in far more detail than you're giving them credit for.

Even a cursory examination of the published netflix prize submissions reveals techniques that mitigate this. Many contestants have made statements indicating how the time of the rating is vital to predictions. Likewise most solutions use at least a few sub strategies that cluster users and or movies by implicit similarity, which will help average out the noise of time variation so long as your movie watching habits are not somehow strangely corollated.

The latter point about preference prediction differing based on the immediacy of the event is well taken but I do not believe it causes gross inaccuracies in the predictions.

I strongly suspect your experiences are related to two things:
1.) insufficient ratings
2.) conflicting ratings between yourself and your spouse

In any case, this is not a subjective debate: the accuracy of these algorithms is trivially tested. This is what the contest is for, after all. And what we find from the contest is we can expect ratings predictions to be within +/- 0.4 stars of accurate.

Granted, there are some practical differences between netflix's production algorithm and the contest submissions, but we're still in the same ballpark. It's clear that for most users, these predictions are quite accurate indeed.

If you find predicted ratings are typically further off than this, I suspect your experience to be atypical, and the most likely explanation for that is garbage in / garbage out. Or alternately you may be of a small minority of users whose preferences are not well predicted. But we do know exactly how accurate these predictions are on the average.

@woody:

one category of collaborative filtering techniques goes by the name of latent factor models. In short this uses some simple but powerful mathematical techniques to find exactly the sort of thing you're talking about. It takes all pairwise comparisons of movies and users, and finds a smaller set of implicit factors that best explain the ratings. So it can exactly find things like that you in general rate jack black movies poorly, but that if certain other factors are strongly present, you may flip to the other extreme and quite like the movie.

Again, I encourage anyone interested on the topic to do some reading. The models used are no where near as simple as people here are assuming, and it's quite fascinating math. Their predictions are by no means perfect, but they deserve a bit more credit than they're being given here.

"I haven't devoted nearly enough time to figuring out the difference between four stars and five stars"

I grappled with a similar issue a few months ago when I wrote about the 5 star rating system used by Yelp. If time and mood are factors in evaluating a movie at a point in time (when the movie itself doesn't change over time) then the issue is muddled further when using a 5-star rating system for bar or restaurants where the same person can have wildly different empirical experiences in the same day.

It seems to me that it's almost impossible to capture all the variables necessary to accurately capture and assess how you feel about something at a given point in time (movie/restaurant/whatever). Even if we could, would enough people want to go though the doubtless cumbersome and time consuming process of collection enough of the time to draw any meaningful conclusions?

You think there is a problem with the algorithms because they generate bad suggestions because of your ideosyncratic tendency to include too intellectual films in your queue? that's like saying that you don't like a particular car because it will not go more than 30 mph. even though you know the reason it will not go faster is the fact that you choose to run it in 1st gear.

By Martin Clausen (not verified) on 21 Aug 2009 #permalink

Jonah - He's not a neurologist, but you might be interested in following the work of Dan Cosley, who just received one of NSF's coveted young researcher awards. Dan's a computer scientist who worked on recommender system algorithms until he realized that the only way to explain a lot of what he was seeing was to become a social scientist. He's now involved with the communication and information science departments at Cornell and does some awesome research on the social phenomena behind recommender systems. The disclaimer here is that he's on my graduate committee, but self-interestedness aside, I think he'd be someone you'd enjoy watching. He's still a young researcher and sure to hit his stride in the coming years. Here's his faculty page.

A second takeaway point from your article that most commenters have missed is that the software should select your recommendations based on the movies you actually watched and rated rather than the movies you just added to your queue. That doesn't seem to conflict with the current model at all.

It is certainly possible for preference algorithms to try to predict likely "states of mind" in the future or likely viewing modes and change recommendations appropriately. I believe that we will see "modal" prediction become a very important driver over time in next-generation media systems.

While I think that some of the critiques presented here are accurate, I also think that Jonah makes a valid point that should not get lost.

Predictive algorithms can only operate on data they have. So much of human preference depends on personal context - am I happy or sad, am I at home with the kids or having a bachelor weekend with the boys, etc. While I have confidence that the recommendations are frequently things that I would like at some point in time (assuming I am honest in my ratings and choices), I may not be in the mood for it right now. There will always be a limit on how accurate algorithms can predict preference as there is always important data that is not available (and often, is not in principle possible to collect) that informs any given individual's preferences at a particular time.

By Joe Wilson (not verified) on 21 Aug 2009 #permalink

This is funny to me because I constantly observe myself trying to rig my ratings to compensate for Netflix's flawed assumptions. That is, I automatically remove one star from my gut-reaction rating to compensate for having been stoned.

By Shannon Murphy (not verified) on 21 Aug 2009 #permalink

With all do respect, Mr. Lehner, it seems that you are suffering from the very self-censoring and selective attention that you write about.

You've jumped to conclusions without bothering to run the experiment- essentially setting out to validate your own subjective anecdotal experience under circumstances where you admittedly:
1) didn't complete enough ratings
2) confound your preferences with those of your wife

In my own subjective experience, I have rated more than 1000 films. Several years ago, when I signed on, I was fascinated with the technology and I rated about 900 in the space of a month. I am astonished to report that the Netflix System is accurate to the point of being unnerving. My experience precisely corresponds to Mr. Jason Watkins description of the research on these filters(above). I certainly don't like the idea that my individual preferences are actually a reflection of a pattern of taste that is shared by others, but I dare anyone to operationalize their own rating system, and then take the challenge and see if Netflix can't accurately tell them exactly how well they will like a film.

Try it- I suspect that you will be amazed. Your sense of individuality and beliefs concerning the contingent complexity of preferences will be deminished as you feel the crush of the bell curve. It does seem to be the least one could do before proposing that the system doesn't work and why.

I also subscribe to Rhapsody online music. I have not rated extensively on this system, and the suggestions I receive are rarely in-line with my tastes.

Since I'm here, I do want to thank you for all the provocative thoughts and well crafted synthesis. "How We Decide," is as useful as it is engaging.

By Joshua levin (not verified) on 23 Aug 2009 #permalink

I have to agree with Joshua: I also got into Netflix very early on (Jan. 2002) and have been religious about rating every movie I've watched. My son had another profile on our account and keeps all of his ratings over there. Netflix is eerily accurate at suggesting how we will rate movies and since I realized that, I've adjusted my queue accordingly. If I get more than 40 or so movies in my queue, I go through and take out any that Netflix thinks I will rate a "3" or less.

Dear Jonah,
I've just finished reading "How We Decide" and wonder how your analysis of the "pretentious red envelope" syndrome (very familiar to me!) jibes with your description of the study involving "fine art" and "humorous cat" posters. There, you interpret the subjects' unexamined, "instinctive" preference for Van Gogh over funny cats as authentic, and it does prove to be more resilient (in terms of self-reported satisfaction) than the choice of cats by subjects who involved their frontal cortices in the justification of their preferences.

That interpretation in itself seemed suspect to me, given the social context that in matters of aesthetic or sensual judgment might more or less subtly influence both one's immediate response and one's long-term satisfaction with any specific work of art (fine or otherwise). Our taste has two sides, very difficult to extricate: the complex pleasure we derive from the work itself, and the complex pleasure we derive from announcing ourselves (even if only to ourselves) as people who like this sort of work.

Even if we limit the question to the actual pleasure we take from the movies in our queues, it's not so simple. Familiar and predigested pleasure advertises itself to our emotional brains very effectively: the movies I queue up and actually watch immediately on arrival tend to the formulaic and predictable. But a working definition of "fine" art (and fine film) would probably include its capacity to surprise: if I could imagine beforehand the pleasures that a Kurosawa film would give me, Kurosawa wouldn't be Kurosawa. That's why my husband and I have found that the most effective way of dusting off anointed masterpieces in their pretentious red envelopes is to give them the "fifteen-minute test": you only have to watch the first fifteen minutes. If you're caught, you really are the kind of person who likes Bergman, De Sica, etc. Congratulations! If not, life's too short... put it back in the envelope, the better to rush Apatow to your door. And the better to relieve the friends who fear you're a highbrow freak.

Re: The moral is that I assume my future self has more refined taste than my actual self - there is a jarring inconsistency between my stated preferences and my actual desires.
This reflects the old adage: you can always tell who a man wishes he were by looking at his bookshelf.

Netflix is no doubt a genius idea. But the whole "If you like this, then you'll love..." program is full of flaws. I'm a movie buff myself, and i know what I like. A computer has no way of knowing what I'm in the mood for. For instance, I recently watched "Panic Room", which was directed by David Fincher (One of my favorite Directors). Fincher had recently made a film "The Curious Case of Benjamin Button" that had received critical acclaim. Assuming that Fincher had made some of the best movies I had ever seen, I had to watch it. After the movie ended, I thought to myself "That was it? David Fincher spent all this time making that movie?" Why did I feel this way? I loved his past films, but this one didn't make me feel the same. It wasn't my genre I guess. So I recently went to Netflix and the program suggested that I rent "The Curious Case of Benjamin Button". What makes the computer think that I would like it, because I liked a completely different film by the same person? The suggestion aspect of the website just doesn't work.

LCB-B120 , you don't really know what you're talking about, sir. It doesn't work on genre matching at all, it works based on other similar people's preferences. A computer, given this, actually can know what you're in the mood for, given no arbitrary moods.

The real problem is that your movie features haven't been learned well enough for it to de-correlate the two movies for you. You see, other users who liked one movie tended to like the other, so until you show it you aren't like other users, it will treat you like them. This is why you need to train it with many recommendations.

This dilemma reminds me of the new program found on itunes; Genius. This program is also supposed to find other songs that are similar to those on a play-list. Needless to say most of the songs are not something that i would enjoy. I usually will find one or two songs that interest me, but the others are useless non the less. Netflix seems to also think this strategy is good for their business, but I do not believe it is helping at all. Although they make a good effort, many people like to pick for their own, just because you liked one funny movie, does not mean you'll like a romantic-comedy in the same way. The computer does not know enough about its owner to be able to pick movies that they will like. Most people will not even rate certain movies, so how is Netflix supposed to know if they liked it or not? I doubt that this program will last much longer, it seems like most people are not happy with the results they are accumulating.

hey, this muscle be smidgin offtopic, but i am hosting my site on hostgator and they determination postpone my hosting in 4days, so i would like to apply to you which hosting do you use or recommend?

We do have some governors in this country that do have some common sense not to accept the money, but the House Majority Whip, Democratic Rep. James Clyburn of South Carolina put an admendment in the bill that says a states legislature can over ride the governor on the use of the stimulus money apportioned to that state.

ladylexx. If u send me ur email address I would show u the emails from her, and u can see for urself what she's like. It's sad because I was trying to help a 'sista' out and ended up getting burned in the end. I started a blog website called 'hairtrends

Magnificent goods from you, man. Netflix : The Frontal Cortex I have understand your stuff previous to and you're just extremely great. I really like what you've acquired here, certainly like what you are stating and the way in which you say it. You make it enjoyable and you still care for to keep it smart. I can't wait to read far more from you. This is really a terrific Netflix : The Frontal Cortex informations.

I was looking for a guide to programming. Your course is excellent. Everything is well described. First, theory, and later examples.

Useful information like this one must be kept and maintained so I will put this one on my bookmark list! Thanks for this wonderful post and hoping to post more of this!

Thanks for another wonderful article. Where else could anybody get that type of information in such a perfect way of writing? I have a presentation next week, and I am on the look for such info.

Just discovered your internet site on google and i believe it truly is a shame that you usually are not ranked higher since this can be a great put up. To vary this I determined to add your internet site to my RSS reader and that i will attempt to mention you in considered one of my posts because you really deserve much more readers when publishing written content of this good quality.

To be a Newbie, IпÑÐпÑÐm often looking online for posts to assist you to me. Thanks for your time Wow!

I believe that avoiding ready-made foods could be the first step for you to lose weight. They may taste fine, but refined foods currently have very little nutritional value, making you take in more in order to have enough vigor to get with the day. When you are constantly having these foods, switching to cereals and other complex carbohydrates will assist you to have more vigor while having less. Thanks alot : ) for your blog post.