Tag Archives: Trust

“Alexa, add voice shopping to my to-do list”

Amazon is promoting voice shopping as part of its deals for Prime Day next week. Shoppers will get $10 credit just for making their first voice purchase from a list of “Alexa Deals“, items that are already greatly discounted. That’s a major incentive just to push consumers into something that should actually be a great benefit – effortless, simple, zero-click shopping. Why does Amazon have to go through so much trouble to get shoppers to use something that’s supposedly so helpful?

To understand the answer, it’s worthwhile to first understand how valuable voice shopping is for Amazon. In all demos and videos for the various Alexa devices, voice shopping is positioned as the perfect tool for spontaneous, instant ordering purchases, such as “Alexa, I need toilet paper / diapers / milk / dog food / …” That easily explains why you would need to be an Amazon Prime subscriber in order to use voice shopping, and getting Prime to every household is a cornerstone to Amazon’s business strategy.

In addition, Alexa orders are fulfilled by 1-click  payment, yet another highly valuable Amazon tool. Amazon also guarantees free returns for Alexa purchases, just in case you’re concerned about getting your order wrong. Now, combine all of these together and you can see how voice shopping is built to create a habit, of shopping as a frictionless, casual activity. That is probably also why the current offer does not apply for voice shopping from within Amazon’s app, as the long process of launching it and reaching the voice search in it ruins the spontaneity.

And yet – shoppers are not convinced. In last year’s Prime Day, a similar promotion offered by Amazon drove on average one voice order per second. This may sound like a lot, but ~85K orders are still a tiny fraction of the total ~50M orders consumers placed on Amazon that day. This year Amazon raised the incentive even further, which indicates there is still much convincing to do. Why is that?

Mute Button by Rob Albright @ Flickr (CC)

For starters, Amazon’s Alexa devices were never built to be shopping-only. Usage survey reports consistently show that most users prefer to use the Alexa assistant to ask questions, play music, and even to set timers, much more than to shop. This does not mean that Amazon has done a bad job, quite the contrary. Voice shopping may not be that much of a habit initially, and getting used to voice-controlling other useful skills helps build habit and trust. Problem is, when you focus on non-shopping, you also get judged by it. That’s how Amazon gets headlines such as “Google Assistant is light-years ahead of Amazon’s Alexa“, with popular benchmarks measuring it by search, question answering and conversational AI, fields where Google has historically invested more than Amazon by orders of magnitude. The upcoming HomePod by Apple is expected to even further complicate Amazon’s stand, with Apple growing to control the slot of a sophisticated, music-focused, high-end smart home device.

The “How it works” page for the Prime Day Alexa deals hints at other issues customers have with shopping in particular. Explanations aim to reassure that no unintended purchases take place (triggered by your kids, or even your TV), and that if your imperfect voice interaction got you the wrong product, returns are free for all Alexa purchases. These may sound like solved issues, but keep in mind the negative (and often unjustified) coverage around unintended purchases has sent countless Echo owners to set a passcode on ordering, which is actually a major setback for the frictionless zero-click purchasing Amazon is after.

But most importantly, voice-only search interfaces have not yet advanced to support interactions that are more complex than a simple context-less pattern recognition. It’s no accident that the most common purchase flows Alexa supports are around re-ordering, where the item is a known item and no search actually takes place. This means that using Alexa for shopping may work well only for those simple pantry shopping, assuming you already made such purchases in the past. Google, on the other hand, is better positioned than Amazon in this respect, having more sophisticated conversational infrastructure. It even enables external developers to build powerful and context-aware Google Assistant apps using tools such as api.ai (for a quick comparison on these developer platforms, see here).

So what might Amazon be doing to make voice shopping more successful?

Re-ordering items is the perfect beginner use-case, being the equivalent of “known item” searches. Amazon may work on expanding the scope of such cases, identifying additional recurring purchase types that can be optimized. These play well with other recent moves by Amazon, such as around grocery shopping and fulfillment.

Shopping lists are a relatively popular Alexa feature (as well as on Google Home), but based on owner testimonials it seems that most users use these for offline shopping. Amazon is likely working to identify more opportunities for driving online purchases from these lists.

Voice interface has focused mainly on a single result, yielding a “I’m Feeling Lucky” interaction. Using data from non-voice interactions, Amazon could build a more interactive script, one that could guide users through more complex decisions. An interesting case study for this has been eBay with its “ShopBot” chatbot, though transitioning to voice-only control still remains a UX challenge.

And finally – it’s worth noting that in the absence of an item in the purchase history (or if the user declines it), Alexa recommends products from what Amazon calls “Amazon’s Choice“, which are “highly rated, well-priced products” as quoted from this help page. This feature is in fact a powerful business tool, pushing vendors to compete for this lucrative slot. In the more distant future, users may trust Alexa to the point of just taking its word for it and assuming this is the best product for them. That will place a huge lever in Amazon’s hands in its relationship with brands and vendors, and it’s very likely that other retailers as well as brands will fight for a similar control, raising the stakes even more on voice search interfaces.

Friendly advice from your “Social Trust Graph”

While scanning for worthy Information Retrieval papers in the recent SIGIR 2009, I came across a paper titled “Learning to Recommend with Social Trust Ensemble“, by a team from the University of Hong Kong. This one is about recommender systems, but putting the social element into text analytics tasks is always interesting (me).

The premise is an interesting one – using your network of trust to improve classic (Collaborative Filtering) recommendations. The authors begin by observing that users’ decisions are the balance between their own tastes, and those of their trusted friends’ recommendations.

Figure 1 from "Learning to Recommend with Social Trust Ensemble" by Ma et al.

Then, they proceed to propose a model that blends analysis of classic user-item matrix where ratings of items by users are stored (the common tool of CF), with analysis of a “social trust graph” that links the user to other users, and through them to their opinions on the items.

This follows the intuition that when trying to draw a recommendation from behavior of other users (which basically is what CF does), some users’ opinions may be more important than others’, and the fact that classic CF ignores that, and treats all users as having identical importance.

The authors show results that out-perform classic CF on a dataset extracted from Epinions. That’s encouraging for any researcher interested in the contribution of the social signal into AI tasks.

free advice at renegade craft fair - CC Flickr/arimoore

However, some issues bother me with this research:

  1. Didn’t the netflix prize winning team approach (see previous post) “prove” that statistical analysis of the user-item matrix beats any external signal other teams tried to use? the answer here may be related to the sparseness of the Epinions data, which makes life very difficult for classic CF. Movie recommendations have much higher density than retail (Epinions’ domain).
  2. To evaluate, the authors sampled 80% or 90% of the ratings as training and the remaining as testing. But if you choose as training the data before the user started following someone, then test it after the user is following that someone, don’t you get a bit mixed up with cause and effect? I mean, if I follow someone and discover a product through his recommendation, there’s a high chance my opinion will also be influenced by his. So there’s no true independence between the training and test data…
  3. Eventually, the paper shows that combining two good methods (social trust graph and classic CF) outperforms each of the methods alone. The general idea of fusion or ensemble of methods is pretty much common knowledge for any Machine Learning researcher. The question should be (but it wasn’t) – does this specific ensemble of methods outperform any other ensemble? and does it fare better than the state of the art result for the same dataset?
own taste and his/her trusted friends’ favors.

The last point is of specific interest to me, having combined keyword-based retrieval with concept-based retrieval in my M.Sc. work. I could easily show that the resulting system outperformed each of the separate elements, but to counter the above questions, I further tested combining other similarly high performing methods to show performance gained there was much lower, and also showed that the combination could take a state of the art result and further improve on it.

Nevertheless, the idea of using opinions from people you know and trust (rather than authorities) in ranking recommendations is surely one that will gain more popularity, as social players start pushing ways to monetize the graphs they worked so hard to build…

In Authority We Trust (Not)

Product reviews are a great thing.

Fake reviews suck.

In the most recent example, an employee solicited paid reviews for his company’s products on Amazon’s Mechanical Turk – got to appreciate the progress.

How can you tell which reviews to trust? Trust is built out of relationship. You trust a site, a person, a brand, after your interactions accumulated enough positive history to earn that trust.  With review sites, you may learn to trust a specific site, but that still doesn’t mean you trust a specific reviewer. I usually try to look at the reviewer’s history, and to look for the “human” side of them – spelling mistakes, topic changes, findings flaws and not just praising. But naturally, the adversary here is also informed, and will try to imitate these aspects…

"Trust us, we're experts" by flickr/phaulyReview sites attempt to bestow trust of their own on their members, to assist us. Amazon uses badges, and encourages users to provide their real name, using a credit card as the identity proof. Midrag is an Israeli service provider ratings site I recently used, that attaches identity to a cellular phone, with a login token sent over SMS. But when you want to attract a large number of reviews, you want to allow unvalidated identities too. Epinions, for example, builds a “web of trust” model based on reviewers trusting or blocking other reviewers. But with Epinions (and similarly Amazon) keeping their trust calculation formula secret, how can users be convinced that this metric fits their needs?

In reality, my model of trust may be quite different from yours. Two Italian researchers published a paper in AAAI-05 titled “Controversial Users demand Local Trust Metrics“, where they experimented with Epinions’ data on the task of predicting users’  trust score, based on existing trust statements. Their findings show that for some users, trust is not an average quantity, but a very individual one, and therefore requires local methods.

Trust metrics can be classified into global and local ones (Massa & Avesani 2004; Ziegler & Lausen 2004). Local trust metrics take into account the subjective opinions of the active user when predicting the trust she places in unknown users. For this reason, the trust score of a certain user can be different when predicted from the point of view of different users. Instead, global trust metrics compute a trust score that approximates how much the community as a whole trusts a specific user.

Have you spotted a familiar pattern?… Just exchange “trust” with “relevance”, and the paragraph will all of a sudden describe authority-based search (PageRank) versus socially-connected search (Delver). Local metrics were found to be more effective for ranking controversial users, meaning users that are assigned individual trust scores that highly deviate from their average score. The search equivalent can be considered queries that are for subjective information, where opinions may vary and an authority score may not be the best choice for each individual searcher.

To read more about trust metrics, see here: trustlet.org