If you liked my blog, you’d like this post. Trust me!

One of the sites that most impressed me when I first started browsing the web was called MovieCritic.com. You would rate a few movies you saw, then it would predict whether you’d like a new movie. It would even let you find one that matches both your taste and your girlfriend’s. Pure magic, for that time. For me that was the first demonstration of what we can achieve with the web as a medium.

MovieCritic is dead for a few years now, but recommender systems are now everywhere. NetFlix runs one of the most successful commercial implementations (Amazon another classic example, “People who bought this book…”), and two years ago they challenged researches to come up with a system that would perform 10% better than their own, in predicting users’ ratings. The best achieving team so far almost got there, and today I attended a talk in the Technion by Yehuda Koren, one of the team members and a researcher at Yahoo! Research Haifa lab.

Most methods follow the neighborhood-based model – find an item’s neighbours (in some representation), and predict based on their rating. This may be done in a user-user matching (find users like this user, then check their rating) or item-item (find items like the rated item, then predict based on how the user rated those items). One of the interesting approaches proposed by Koren’s team represented both users and movies in the same space, then looked for similarity in this unified space.

The most striking finding for me, however, was that winning strategies did not use anything from the movie’s “content” features. Genre, director, actors, length, etc. – all these did not produce any additional value beyond the plain statistical analysis and correlation of ratings and users, and are therefore not used at all. In fact, Koren claims that knowing that a certain user is a Tom Hanks fan makes no difference, we will infer this from the recommendations anyway (assuming there are enough of them of course).

I find that almost sad… Not being able to intelligently reason over the underlying logic exposed by an AI software is a tremendous drawback in my eyes, even if the overall prediction score is better. Telling the user “you may want to watch this movie because A and B and C” can benefit in more satisfaction by the user, understanding even the incorrect predictions, and possibly leading to a feedback cycle. Doing away with it is like showing web search results without keyword highlighting, no visible cue for the user why this result was returned (“…trust me, I know what’s the right answer for you!“).

5 responses to “If you liked my blog, you’d like this post. Trust me!

  1. “Not being able to intelligently reason over the underlying logic exposed by an AI software is a tremendous drawback in my eyes”

    That’s because this is not AI at all. This is just “crowds intelligence” software. I think what is really saddening you is the fact that after so many years of trying, AI approaches are still less successful than others.

    Another curious anecdote related to the NetFlix challenge is the “Napoleon Dynamite” problem (http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?pagewanted=all). Some movies are very “polarizing” and so, most NN approaches tend to struggle with them.

  2. Well, I wouldn’t say it’s not AI. A lot of AI research follows these lines, and the state of the art in machine translation (which is NLP, therefore AI) for example, is a 100% statistical approach, with almost no expert knowledge or reasoning involved. And the crowds intelligence sub-field even has a fancy name, it’s collaborative filtering
    But you are right that expert-systems have lost their appeal as an AI tool, in favor of statistical approaches, and that leads to less reasoning possible over results.
    Napoleon Dynamite – you’re right, he also mentioned that briefly. That probably is also why “Miss Congeniality” is the most rated movie in the dataset… 🙂

  3. Pingback: Friendly advice from your “Social Trust Graph” « The Alter Egozi

  4. Pingback: Farewell Academia | The Alter Egozi

  5. I’m not sure exactly why but this weblog is loading extremely slow for me. Is anyone else having this issue or is it a issue on my end? I’ll check
    back later on and see if the problem still exists.

Leave a reply to Pasha Cancel reply