Tag Archives: Social Commerce

Friendly advice from your “Social Trust Graph”

While scanning for worthy Information Retrieval papers in the recent SIGIR 2009, I came across a paper titled “Learning to Recommend with Social Trust Ensemble“, by a team from the University of Hong Kong. This one is about recommender systems, but putting the social element into text analytics tasks is always interesting (me).

The premise is an interesting one – using your network of trust to improve classic (Collaborative Filtering) recommendations. The authors begin by observing that users’ decisions are the balance between their own tastes, and those of their trusted friends’ recommendations.

Figure 1 from "Learning to Recommend with Social Trust Ensemble" by Ma et al.

Then, they proceed to propose a model that blends analysis of classic user-item matrix where ratings of items by users are stored (the common tool of CF), with analysis of a “social trust graph” that links the user to other users, and through them to their opinions on the items.

This follows the intuition that when trying to draw a recommendation from behavior of other users (which basically is what CF does), some users’ opinions may be more important than others’, and the fact that classic CF ignores that, and treats all users as having identical importance.

The authors show results that out-perform classic CF on a dataset extracted from Epinions. That’s encouraging for any researcher interested in the contribution of the social signal into AI tasks.

free advice at renegade craft fair - CC Flickr/arimoore

However, some issues bother me with this research:

  1. Didn’t the netflix prize winning team approach (see previous post) “prove” that statistical analysis of the user-item matrix beats any external signal other teams tried to use? the answer here may be related to the sparseness of the Epinions data, which makes life very difficult for classic CF. Movie recommendations have much higher density than retail (Epinions’ domain).
  2. To evaluate, the authors sampled 80% or 90% of the ratings as training and the remaining as testing. But if you choose as training the data before the user started following someone, then test it after the user is following that someone, don’t you get a bit mixed up with cause and effect? I mean, if I follow someone and discover a product through his recommendation, there’s a high chance my opinion will also be influenced by his. So there’s no true independence between the training and test data…
  3. Eventually, the paper shows that combining two good methods (social trust graph and classic CF) outperforms each of the methods alone. The general idea of fusion or ensemble of methods is pretty much common knowledge for any Machine Learning researcher. The question should be (but it wasn’t) – does this specific ensemble of methods outperform any other ensemble? and does it fare better than the state of the art result for the same dataset?
own taste and his/her trusted friends’ favors.

The last point is of specific interest to me, having combined keyword-based retrieval with concept-based retrieval in my M.Sc. work. I could easily show that the resulting system outperformed each of the separate elements, but to counter the above questions, I further tested combining other similarly high performing methods to show performance gained there was much lower, and also showed that the combination could take a state of the art result and further improve on it.

Nevertheless, the idea of using opinions from people you know and trust (rather than authorities) in ranking recommendations is surely one that will gain more popularity, as social players start pushing ways to monetize the graphs they worked so hard to build…

Clustering Search (yet again)

Microsoft is rolling an internal test for a search experience upgrade on Live (codenamed Kumo) that clusters search results by aspect. See internal memo and screenshots covered by Kara Swisher.

As usual, the immediate reaction is – regardless of the actual change, how feasible is it to assume you could make users switch from their Google habit? but let’s try to put that aside and do look at the actual change.

Search results are grouped into clusters based on the aspects of this particular search query. This idea is far from being new, and was attempted in the past by both Vivisimo (at Clusty.com) and by Ask.com. One difference, though, is that Microsoft pushes the aspects further into the experience, by showing a long page of results with several top results from each aspect (similar to Google’s push with spelling mistakes).

At least judging by the (possibly engineered) sample results, the clustering works better than previous attempts. Most search engines take the “related queries” twist on this, while Kumo includes related queries as a separate widget:

kumo-comparisonClusty.com’s  resulting clusters, on the other hand, are far from useful for a serious searcher with enquire/purchase intent.

At least based on these screenshots, it seems like Microsoft succeeded in distilling interesting aspects better, while maintaining useful labels (e.g. “reviews”). Of course, it’s possible this is all done as a “toy”, limited example, e.g. using some predefined ontology. But together with other efforts, such as the “Cashback” push and the excellent product search (including reviews aggregation and sentiment analysis), it seems like Microsoft may be in the process of  positioning Live as the search engine for ecommerce. Surely a good niche to be in…

live-products1

In Authority We Trust (Not)

Product reviews are a great thing.

Fake reviews suck.

In the most recent example, an employee solicited paid reviews for his company’s products on Amazon’s Mechanical Turk – got to appreciate the progress.

How can you tell which reviews to trust? Trust is built out of relationship. You trust a site, a person, a brand, after your interactions accumulated enough positive history to earn that trust.  With review sites, you may learn to trust a specific site, but that still doesn’t mean you trust a specific reviewer. I usually try to look at the reviewer’s history, and to look for the “human” side of them – spelling mistakes, topic changes, findings flaws and not just praising. But naturally, the adversary here is also informed, and will try to imitate these aspects…

"Trust us, we're experts" by flickr/phaulyReview sites attempt to bestow trust of their own on their members, to assist us. Amazon uses badges, and encourages users to provide their real name, using a credit card as the identity proof. Midrag is an Israeli service provider ratings site I recently used, that attaches identity to a cellular phone, with a login token sent over SMS. But when you want to attract a large number of reviews, you want to allow unvalidated identities too. Epinions, for example, builds a “web of trust” model based on reviewers trusting or blocking other reviewers. But with Epinions (and similarly Amazon) keeping their trust calculation formula secret, how can users be convinced that this metric fits their needs?

In reality, my model of trust may be quite different from yours. Two Italian researchers published a paper in AAAI-05 titled “Controversial Users demand Local Trust Metrics“, where they experimented with Epinions’ data on the task of predicting users’  trust score, based on existing trust statements. Their findings show that for some users, trust is not an average quantity, but a very individual one, and therefore requires local methods.

Trust metrics can be classified into global and local ones (Massa & Avesani 2004; Ziegler & Lausen 2004). Local trust metrics take into account the subjective opinions of the active user when predicting the trust she places in unknown users. For this reason, the trust score of a certain user can be different when predicted from the point of view of different users. Instead, global trust metrics compute a trust score that approximates how much the community as a whole trusts a specific user.

Have you spotted a familiar pattern?… Just exchange “trust” with “relevance”, and the paragraph will all of a sudden describe authority-based search (PageRank) versus socially-connected search (Delver). Local metrics were found to be more effective for ranking controversial users, meaning users that are assigned individual trust scores that highly deviate from their average score. The search equivalent can be considered queries that are for subjective information, where opinions may vary and an authority score may not be the best choice for each individual searcher.

To read more about trust metrics, see here: trustlet.org