Tag Archives: Usability

“Alexa, add voice shopping to my to-do list”

Amazon is promoting voice shopping as part of its deals for Prime Day next week. Shoppers will get $10 credit just for making their first voice purchase from a list of “Alexa Deals“, items that are already greatly discounted. That’s a major incentive just to push consumers into something that should actually be a great benefit – effortless, simple, zero-click shopping. Why does Amazon have to go through so much trouble to get shoppers to use something that’s supposedly so helpful?

To understand the answer, it’s worthwhile to first understand how valuable voice shopping is for Amazon. In all demos and videos for the various Alexa devices, voice shopping is positioned as the perfect tool for spontaneous, instant ordering purchases, such as “Alexa, I need toilet paper / diapers / milk / dog food / …” That easily explains why you would need to be an Amazon Prime subscriber in order to use voice shopping, and getting Prime to every household is a cornerstone to Amazon’s business strategy.

In addition, Alexa orders are fulfilled by 1-click  payment, yet another highly valuable Amazon tool. Amazon also guarantees free returns for Alexa purchases, just in case you’re concerned about getting your order wrong. Now, combine all of these together and you can see how voice shopping is built to create a habit, of shopping as a frictionless, casual activity. That is probably also why the current offer does not apply for voice shopping from within Amazon’s app, as the long process of launching it and reaching the voice search in it ruins the spontaneity.

And yet – shoppers are not convinced. In last year’s Prime Day, a similar promotion offered by Amazon drove on average one voice order per second. This may sound like a lot, but ~85K orders are still a tiny fraction of the total ~50M orders consumers placed on Amazon that day. This year Amazon raised the incentive even further, which indicates there is still much convincing to do. Why is that?

Mute Button by Rob Albright @ Flickr (CC)

For starters, Amazon’s Alexa devices were never built to be shopping-only. Usage survey reports consistently show that most users prefer to use the Alexa assistant to ask questions, play music, and even to set timers, much more than to shop. This does not mean that Amazon has done a bad job, quite the contrary. Voice shopping may not be that much of a habit initially, and getting used to voice-controlling other useful skills helps build habit and trust. Problem is, when you focus on non-shopping, you also get judged by it. That’s how Amazon gets headlines such as “Google Assistant is light-years ahead of Amazon’s Alexa“, with popular benchmarks measuring it by search, question answering and conversational AI, fields where Google has historically invested more than Amazon by orders of magnitude. The upcoming HomePod by Apple is expected to even further complicate Amazon’s stand, with Apple growing to control the slot of a sophisticated, music-focused, high-end smart home device.

The “How it works” page for the Prime Day Alexa deals hints at other issues customers have with shopping in particular. Explanations aim to reassure that no unintended purchases take place (triggered by your kids, or even your TV), and that if your imperfect voice interaction got you the wrong product, returns are free for all Alexa purchases. These may sound like solved issues, but keep in mind the negative (and often unjustified) coverage around unintended purchases has sent countless Echo owners to set a passcode on ordering, which is actually a major setback for the frictionless zero-click purchasing Amazon is after.

But most importantly, voice-only search interfaces have not yet advanced to support interactions that are more complex than a simple context-less pattern recognition. It’s no accident that the most common purchase flows Alexa supports are around re-ordering, where the item is a known item and no search actually takes place. This means that using Alexa for shopping may work well only for those simple pantry shopping, assuming you already made such purchases in the past. Google, on the other hand, is better positioned than Amazon in this respect, having more sophisticated conversational infrastructure. It even enables external developers to build powerful and context-aware Google Assistant apps using tools such as api.ai (for a quick comparison on these developer platforms, see here).

So what might Amazon be doing to make voice shopping more successful?

Re-ordering items is the perfect beginner use-case, being the equivalent of “known item” searches. Amazon may work on expanding the scope of such cases, identifying additional recurring purchase types that can be optimized. These play well with other recent moves by Amazon, such as around grocery shopping and fulfillment.

Shopping lists are a relatively popular Alexa feature (as well as on Google Home), but based on owner testimonials it seems that most users use these for offline shopping. Amazon is likely working to identify more opportunities for driving online purchases from these lists.

Voice interface has focused mainly on a single result, yielding a “I’m Feeling Lucky” interaction. Using data from non-voice interactions, Amazon could build a more interactive script, one that could guide users through more complex decisions. An interesting case study for this has been eBay with its “ShopBot” chatbot, though transitioning to voice-only control still remains a UX challenge.

And finally – it’s worth noting that in the absence of an item in the purchase history (or if the user declines it), Alexa recommends products from what Amazon calls “Amazon’s Choice“, which are “highly rated, well-priced products” as quoted from this help page. This feature is in fact a powerful business tool, pushing vendors to compete for this lucrative slot. In the more distant future, users may trust Alexa to the point of just taking its word for it and assuming this is the best product for them. That will place a huge lever in Amazon’s hands in its relationship with brands and vendors, and it’s very likely that other retailers as well as brands will fight for a similar control, raising the stakes even more on voice search interfaces.

Feeling Lucky Is the Future of Search

If you visit the Google homepage on your desktop, you’ll see a rare, prehistoric specimen – one that most Google users don’t see the point of: the “I’m Feeling Lucky” button.

Google has already removed it from most of its interfaces, and even here it only serves as a teaser for various Google nitwit projects. And yet the way things are going, the “Feeling Lucky” ghost may just come back to life – and with a vengeance.


In the early years, the “I’m Feeling Lucky” button was Google’s way of boldly stating “Our results are so great, you can just skip the result lists and head straight to destination #1”. It was a nice, humorous touch, but one that never really caught on as users’ needs grew more complex and less obvious. In fact, it lost Google quite a lot of money, since skipping the result list also meant users saw fewer and fewer sponsored results – Google’s main income source. But usability testing showed that users really liked seeing the button, so Google kept it there for a while.

But there’s another interface rising up that prides itself on returning the first search result without showing you the list. Did you already guess what it is?


Almost every demo of a new personal assistant product will include questions being answered by the bot tapping into a search engine. The demos will make sure to use simple single-answer cases, like “Who is the governor of California?” That’s extremely neat, and was regarded as science fiction not so many decades ago. Amazing work on query parsing and entity extraction from search results has led to great results on this type of query, and the quality of the query understanding, and resulting answers, is usually outstanding.


However, these are just some of the possible searches we want our bots to run. As we get more and more comfortable with this new interface, we will not want to limit ourselves to one type of query. If you want to be able to get an answer for “Give me a good recipe for sweet potato pie” or “Which Chinese restaurants are open in the area now?”, you need a lot more than a single answer. You need verbosity, you need to be able to refine – which stretches the limits of how we perceive conversational interfaces today.

Part of the problem is that it’s difficult for users to understand the limits of conversational interfaces, especially when bot creators pretend that there are no such limits. Another problem lies in the fact that a natural language interface may simply be a lousy choice for some interaction types, and imposing it on them will only frustrate users.

There is a whole new paradigm of user interaction waiting to be invented, to support non-trivial search and refine through conversation – for all of those many cases where a short exchange and single result will probably not do. We will need to find a way to flip between vocal and visual, manage a seamless thread between devices and screen-based apps, and make digital assistants keep context on a much higher level.

Until then, I guess we’ll continue hoping that we’re feeling lucky.



To Tweet or Not to Tweet (hint: that’s not the question)

I was catching up on my RSS overload the other day, when this side note in a post by Naaman on Social Media Multitaskers caught my attention:

“I find that I now blog thoughts that are too long to fit in a tweet; so feel free to follow my tweets…”

"I am the man. I suffered. I was there." CC by 'Kalense Kid'/Flickr

I’m not too much of a media multitasker myself, so I don’t experience this duality first hand, but I can imagine it: you get an interesting thought or experience, then you think is this major enough to develop into a blog post, for which I’ll go over here, or is it not that heavy / can’t be bothered, in which case I flutter my wings over there. Actually I do experience these, just that in the other case I simply drop it (and excuse me for not considering Facebook status updates an option, that’s stuff for another post…)

This should not have been a dilemma at all, had blogging platforms evolved to accommodate microblogging, which today is somehow seen as the centralized domain of a single commercial company. You really should be able to hop on your publishing platform, write that thought down, regardless of length, and fire it out. No need to figure out which channel to use, and whether the intended readers are indeed following you there. Similarly, your friends/readers should not have to register to your feeds on different platforms but rather consume one only, and rely on a powerful set of rules to filter your stream as they find fit.

posterous-mediumPosterous is a great (and fast growing) example of how easy it can be from the blogger’s perspective. Just post it (or rather, email it) and it will get published as needed (e.g. shortened for twitter). But it does not make it any easier on the consumer, who still needs to decide where to best follow this blogger (does he perhaps write additional blog posts directly on his blog that won’t show up on his twitter? or vice versa?’) and reduces the basic filtering capability that may have existed when different post types were distributed into the different services.

No need to reinvent the wheel here, blogging platforms are abundant, decentralized and perfectly fit to remain our publishing hub, with their developed CMS and the loose but well-defined social networks. What blogging platforms should do – heck, what Automattic should do to evolve, is:

  1. Conversation support the realtime conversational nature of short posts, with the right UI and notifications mechanisms. The “P2” microblogging-optimized theme released almost two years ago was a good start, sadly it still followed the line of thought of “blog or microblog, not both”. To move forward, Automattic need to realize that Twitter is not a personality, it’s a state of mind, hence also P2 can’t be a permanent theme, it should be a contextual theme.
  2. Publishingacquire Posterous. As simple as that. These guys got their fame by understanding the pains of publishing anytime anywhere, they know a thing or two on usability and persuasion, and they have great buzz. The latter is not luxury – a buzzed-up acquisition makes it very clear that this is a major strategy for you, a lot more than if you’d develop the same changes yourself.
  3. Consuming – that’s the tricky part… how do you embed Twitter and WordPress into the same stream, when each consumer has their own desired blend of it. We don’t want to invent a new technology, RSS is here to stay. We do want better ways of filtering our floods using better tagging coupled with more clever feed options. How exactly – I do hope there’s an entire team at Automattic working exactly on that…

    The Opportunity in RSS Overload

    Dare Obasanjo has an interesting post, with a good comments thread, on overflowing feed readers. He’s quoting from a post by Farhad Manjoo on Slate:

    You know that sinking feeling you get when you open your e-mail and discover hundreds of messages you need to respond to…

    Well, actually Dare’s post is from two weeks ago. The reason I got to read it only now is exactly that…

    Yes, I know I don’t really need to ‘respond’ to subscriptions, and the answer should be – unsubscribe, or go on a feeds (or ‘follow’ edges) social diet. But these binary decisions are not always optimal, as I have plenty of feeds I subscribed to after hitting one or two posts I really liked, but that were not on that author’s main subject (if such exists at all). Thus I have to skim through many un-interesting (for me!) posts, many of them somehow always end up discussing twitter. In fact, that’s how most of my feeds look like (including the twitter part).

    We need shades of grey between subscribed and unsubscribed. It would be great to have a feed reader that learns from how you use it. It should be quite clear which posts interest me – ones I took time to read, scroll through, press a link etc. – and which did not. Now train a classifier on that data, preferably per-feed (in addition to a general one), and get some sense of what I’m really looking for.
    Mark All As Read Day - flickr/sidereal

    Now, I don’t need this smart reader to delete the uninteresting ones, let’s not assume too much on its classification accuracy. Just find the right UI to mark the predicted-to-be-interesting items (or even assign them into a special virtual folder). Then I can read these first, and only if/when have time – read the rest.

    I assign this to be my pet project in case I win the lottery next week and go into early retirement. Alternatively, if someone saw this implemented anywhere – let me know!

    Update: a related follow-up post on a new filtering product I started using.

    Building Blocks of Creativity

    Long ago I read an interesting book that tried to teach how to actually engineer creativity. One of the simple methods it proposed was – take an existing device, and strip it of a main characteristic. A TV set without a screen, for example. Not good for anything you say? well, if you’re a real soap opera freak, frustrated that they always run when you’re driving back from work, you could imagine installing this in your car and listening to your TV while driving, rather than watching…

    So let’s take an iPhone and strip it of its… phone. What do you get besides an eye? Siftables. Got this shared from Oren:

    To me, seeing this makes my mind immediately run to how my kids could use it. This is surely a creative Human-Computer Interface, but does that automatically make the applications creative? see the one with the kid injecting characters into a scene played on TV. That’s great, but it’s really limited to the scenarios programmed into this app: the sun can rise, the tractor enters, the dog says hello to the cat – ok, got it. Now what?

    My kids and I actually have a non-Siftables version of this, where we took some game that includes plastic blocks with various images on them, and turned it into a storytelling game. Each player stacks a bunch of these blocks and tries to tell a continuous story by picking a block and fitting it into a story he’s improvising as he goes along. That’s a real creative challenge, and it is so because you have nobody to rely on but your own imagination.

    Another example is the Lego themed sets, non-creativewhere there’s really just one way to assemble them right, and imagination is out of the equation. As an educational tool, standard plain old Lego blocks are far superior. The less rules and options, the more creatively challenged we are, and the more a Siftables app follows that principle, the more educational it may actually become.

    In any case, Siftables are a great idea, and will surely be a great challenge to the creativity of programmers of Siftables apps…

    Clustering Search (yet again)

    Microsoft is rolling an internal test for a search experience upgrade on Live (codenamed Kumo) that clusters search results by aspect. See internal memo and screenshots covered by Kara Swisher.

    As usual, the immediate reaction is – regardless of the actual change, how feasible is it to assume you could make users switch from their Google habit? but let’s try to put that aside and do look at the actual change.

    Search results are grouped into clusters based on the aspects of this particular search query. This idea is far from being new, and was attempted in the past by both Vivisimo (at Clusty.com) and by Ask.com. One difference, though, is that Microsoft pushes the aspects further into the experience, by showing a long page of results with several top results from each aspect (similar to Google’s push with spelling mistakes).

    At least judging by the (possibly engineered) sample results, the clustering works better than previous attempts. Most search engines take the “related queries” twist on this, while Kumo includes related queries as a separate widget:

    kumo-comparisonClusty.com’s  resulting clusters, on the other hand, are far from useful for a serious searcher with enquire/purchase intent.

    At least based on these screenshots, it seems like Microsoft succeeded in distilling interesting aspects better, while maintaining useful labels (e.g. “reviews”). Of course, it’s possible this is all done as a “toy”, limited example, e.g. using some predefined ontology. But together with other efforts, such as the “Cashback” push and the excellent product search (including reviews aggregation and sentiment analysis), it seems like Microsoft may be in the process of  positioning Live as the search engine for ecommerce. Surely a good niche to be in…


    Why Search Innovation is Dead

    We like to think of web search as quite a polished tool. I still find myself amazed at the ease with which difficult questions can be answered just by googling them. Is there really much to go from here?

    "Hasn't Google solved search?" by asmythie/Flickr

    Brynn Evans has a great post on why social search won’t topple Google anytime soon. In it, she shares some yet to be published results on difficulty in forming the query being a major cause for failed searches. That resonated well with some citations I’m collecting right now for my thesis (on concept-based information retrieval). It also reminded me of a post Marissa Mayer of Google wrote some months ago, titled “The Future of Search“.  One of the main items on that future of hers was natural language search, or as she put it:

    This notion brings up yet another way that “modes” of search will change – voice and natural language search. You should be able to talk to a search engine in your voice. You should also be able to ask questions verbally or by typing them in as natural language expressions. You shouldn’t have to break everything down into keywords.

    Mayer gives some examples to questions that were difficult to query or formulate by keywords. But clearly she has the question in her head, so why not just type it in? after all, Google does attempt to answer questions. Mayer (and Brynn too) mentions the lack of context as one reason. Some questions, if phrased naively, refer to the user’s location, activities or other context. It’s a reasonable, though somewhat exaggerated point.  Users aren’t really that naive or lazy, if instead of using search they’d call up a friend, they wouldn’t ask “can you tell me the name of that bird flying out there?”. The same info they would provide verbally, they can also provide to a natural-language search engine, if properly guided.

    The more significant reason in my eyes revolves around habits. Search is usually a means, rather than a goal. So we don’t want to think where and how to search, we just want to type something quickly into that good old search box and fire it away. It’s no wonder that the search engine most bent on sending you away asap, has most loyal users coming back for more. That same engine even has a button, that hardly anyone uses, and supposedly costs them over 100M$ a year in revenues, that sends users away even faster.  So changing this habit is a tough task for any newcomers.

    But these habits go deeper than that. Academic researchers have long studied natural-language search and concept-based search. A combination of effective keyword-based search, together with a more elaborate approach that kicks in when the query is a tough one, could have gained momentum, and some attempts were made for commercial products (most notable Ask, Vivisimo and Powerset). They all failed. Users are so used to the “exact keyword match” paradigm, the total control it provides them with, and its logic (together with its shortcomings) that a switch is nearly impossible, unless Google will drive such a change.

    Until that happens, we’ll have to continue limiting innovations to small tweaks over the authorities…