Tag Archives: Usability

ChatGPT will kill Google Queries, not Results

It has been a very long time – too long – since Search was disrupted. So it was only appropriate for the hype cycle to reach disproportionate levels with both Microsoft and Google embracing Large Language Models, namely ChatGPT and Bard, into their search experience. It should be noted that “experience” is the key word here, since LLMs have been part of the search backend for quite some time now, only behind the scenes.

By now, we’ve all heard the arguments against these chat-like responses as direct search results. The models are trained to return a single, well articulated piece of text, that pretends to provide an answer, not just a results list. Current models were trained to provide a confident response with lower emphasis on accuracy, which clearly shows when they are put to actual usage. Sure, it’s fun to get a random recipe idea, but getting the wrong information about a medical condition is a totally different story.

So we are likely to see more efforts invested in providing explainability and credibility, and in training the model to project the appropriate confidence vased on sources and domain. The end result may be an actual response for some queries, while for others more of a summary of “what’s out there”, but in all cases there will likely be a reference to the sources, letting the searcher decide whether they trust this reponse, or still need to drill into classic links to validate.

This begs the question then – is this truly a step function, versus what we already have today?

A week ago, I was on the hunt for a restaurant to go to, with a friend visiting from abroad. That friend had a very specific desire – to dine at a fish restaurant, that also serves hummus. Simple enough, isn’t it? Asking Google for “fish restaurants in tel aviv that also serve hummus” quickly showed how very much not so. Google simply failed to understand me. I got plenty of suggestions, some serving fish and some serving hummus, but no guarantee to serving both. I had to painstakingly go one by one and check them out, and most of them had either one or the other. I kept refining that query over and over, as my frustration kept growing.

With the hype still fresh on my mind, I headed over to ChatGPT:

Great. That’s not much help, is it? I asked for a fish restaurant and got a hummus restauarant. For such lack of understanding, I could have stuck with Google. Let’s give it one last try before giving up…

That, right there, was my ‘Aha’ moment.

This result, and the validation it included, was precisely what I was looking for. ChatGPT’s ability to take both context pieces and combine them in a way that reflected back to me what information it is providing, totally made all the difference.

This difference is not obvious. Almost all of the examples in those launch events could do great also with keywords. Google’s Bard announcement post primary examples (beyond the “James Webb” fiasco) were “is the piano or guitar easier to learn, and how much practice does each need?” and “what are the best constellations to look for when stargazing?“. But take any of these as a regular Google queries, and you will get a decent result snippet from a trusted source, as well as a list of very relevant links. At least here you know where the answer is coming from, and can decide whether to trust it or not!

Left: Bard results from announcement post. Right: current Google results for the same query

In fact, Bing’s announcement post included better examples, ones that would work, but would not be optimal for classic search results, such as “My anniversary is coming up in September, help me plan a trip somewhere fun in Europe, leaving from London” (“leaving from London” is not handled well in a search query), or “Will the Ikea Klippan loveseat fit into my 2019 Honda Odyssey?” (plenty of related search results, but not for this exact ikea piece).

The strength of new language models is their ability to understand a much larger context. When Google started applying BERT into their query understanding, that was a significant step in the right direction, moving further away from what their VP Search described as “keyword-ese”, writing queries that are not natural, but that searchers imagine will convey the right meaning. A query he used there was “brazil traveler to usa need a visa” which previously gave results for US travelers to Brazil – perfect example for how looking only at keywords (or “Bag of Words” approach) would fail when not examining the entire context.

I am a veteran search user; I still am cognizant of these constraints when I formulate a search query. That is why I find myself puzzled when my younger daughter enters a free-form question into Google rather than translate it to carefully-selected keywords, as I do. Of course, that should be the natural interface, it just doesn’t work well enough. That is not just a technical limitation – human language is difficult. It is complex, ambiguous, and above all, highly dependent on context.

New language models can enable the query understanding modules in search engines to better understand these more complex intents. First, they will do a much better job at getting keywords context. Then, they will provide reflection; the restaurant example demonstrates how simply reflecting the intent back to the users, enabling them to validate that what they get is truly what they meant, goes a long way to help compensate for mistakes that NLP models will continue to make. And finally, the interactive nature, the ability to reformulate the query as a result of this reflection by simply commenting on what should change, will make the broken experience of today feel more like a natural part of a conversation. All of these will finally get us closer to that natural human interface, as the younger cohort of users so rightfully expects.

“Alexa, add voice shopping to my to-do list”

Amazon is promoting voice shopping as part of its deals for Prime Day next week. Shoppers will get $10 credit just for making their first voice purchase from a list of “Alexa Deals“, items that are already greatly discounted. That’s a major incentive just to push consumers into something that should actually be a great benefit – effortless, simple, zero-click shopping. Why does Amazon have to go through so much trouble to get shoppers to use something that’s supposedly so helpful?

To understand the answer, it’s worthwhile to first understand how valuable voice shopping is for Amazon. In all demos and videos for the various Alexa devices, voice shopping is positioned as the perfect tool for spontaneous, instant ordering purchases, such as “Alexa, I need toilet paper / diapers / milk / dog food / …” That easily explains why you would need to be an Amazon Prime subscriber in order to use voice shopping, and getting Prime to every household is a cornerstone to Amazon’s business strategy.

In addition, Alexa orders are fulfilled by 1-click  payment, yet another highly valuable Amazon tool. Amazon also guarantees free returns for Alexa purchases, just in case you’re concerned about getting your order wrong. Now, combine all of these together and you can see how voice shopping is built to create a habit, of shopping as a frictionless, casual activity. That is probably also why the current offer does not apply for voice shopping from within Amazon’s app, as the long process of launching it and reaching the voice search in it ruins the spontaneity.

And yet – shoppers are not convinced. In last year’s Prime Day, a similar promotion offered by Amazon drove on average one voice order per second. This may sound like a lot, but ~85K orders are still a tiny fraction of the total ~50M orders consumers placed on Amazon that day. This year Amazon raised the incentive even further, which indicates there is still much convincing to do. Why is that?

Mute Button by Rob Albright @ Flickr (CC)

For starters, Amazon’s Alexa devices were never built to be shopping-only. Usage survey reports consistently show that most users prefer to use the Alexa assistant to ask questions, play music, and even to set timers, much more than to shop. This does not mean that Amazon has done a bad job, quite the contrary. Voice shopping may not be that much of a habit initially, and getting used to voice-controlling other useful skills helps build habit and trust. Problem is, when you focus on non-shopping, you also get judged by it. That’s how Amazon gets headlines such as “Google Assistant is light-years ahead of Amazon’s Alexa“, with popular benchmarks measuring it by search, question answering and conversational AI, fields where Google has historically invested more than Amazon by orders of magnitude. The upcoming HomePod by Apple is expected to even further complicate Amazon’s stand, with Apple growing to control the slot of a sophisticated, music-focused, high-end smart home device.

The “How it works” page for the Prime Day Alexa deals hints at other issues customers have with shopping in particular. Explanations aim to reassure that no unintended purchases take place (triggered by your kids, or even your TV), and that if your imperfect voice interaction got you the wrong product, returns are free for all Alexa purchases. These may sound like solved issues, but keep in mind the negative (and often unjustified) coverage around unintended purchases has sent countless Echo owners to set a passcode on ordering, which is actually a major setback for the frictionless zero-click purchasing Amazon is after.

But most importantly, voice-only search interfaces have not yet advanced to support interactions that are more complex than a simple context-less pattern recognition. It’s no accident that the most common purchase flows Alexa supports are around re-ordering, where the item is a known item and no search actually takes place. This means that using Alexa for shopping may work well only for those simple pantry shopping, assuming you already made such purchases in the past. Google, on the other hand, is better positioned than Amazon in this respect, having more sophisticated conversational infrastructure. It even enables external developers to build powerful and context-aware Google Assistant apps using tools such as api.ai (for a quick comparison on these developer platforms, see here).

So what might Amazon be doing to make voice shopping more successful?

Re-ordering items is the perfect beginner use-case, being the equivalent of “known item” searches. Amazon may work on expanding the scope of such cases, identifying additional recurring purchase types that can be optimized. These play well with other recent moves by Amazon, such as around grocery shopping and fulfillment.

Shopping lists are a relatively popular Alexa feature (as well as on Google Home), but based on owner testimonials it seems that most users use these for offline shopping. Amazon is likely working to identify more opportunities for driving online purchases from these lists.

Voice interface has focused mainly on a single result, yielding a “I’m Feeling Lucky” interaction. Using data from non-voice interactions, Amazon could build a more interactive script, one that could guide users through more complex decisions. An interesting case study for this has been eBay with its “ShopBot” chatbot, though transitioning to voice-only control still remains a UX challenge.

And finally – it’s worth noting that in the absence of an item in the purchase history (or if the user declines it), Alexa recommends products from what Amazon calls “Amazon’s Choice“, which are “highly rated, well-priced products” as quoted from this help page. This feature is in fact a powerful business tool, pushing vendors to compete for this lucrative slot. In the more distant future, users may trust Alexa to the point of just taking its word for it and assuming this is the best product for them. That will place a huge lever in Amazon’s hands in its relationship with brands and vendors, and it’s very likely that other retailers as well as brands will fight for a similar control, raising the stakes even more on voice search interfaces.

Feeling Lucky Is the Future of Search

If you visit the Google homepage on your desktop, you’ll see a rare, prehistoric specimen – one that most Google users don’t see the point of: the “I’m Feeling Lucky” button.

Google has already removed it from most of its interfaces, and even here it only serves as a teaser for various Google nitwit projects. And yet the way things are going, the “Feeling Lucky” ghost may just come back to life – and with a vengeance.


In the early years, the “I’m Feeling Lucky” button was Google’s way of boldly stating “Our results are so great, you can just skip the result lists and head straight to destination #1”. It was a nice, humorous touch, but one that never really caught on as users’ needs grew more complex and less obvious. In fact, it lost Google quite a lot of money, since skipping the result list also meant users saw fewer and fewer sponsored results – Google’s main income source. But usability testing showed that users really liked seeing the button, so Google kept it there for a while.

But there’s another interface rising up that prides itself on returning the first search result without showing you the list. Did you already guess what it is?


Almost every demo of a new personal assistant product will include questions being answered by the bot tapping into a search engine. The demos will make sure to use simple single-answer cases, like “Who is the governor of California?” That’s extremely neat, and was regarded as science fiction not so many decades ago. Amazing work on query parsing and entity extraction from search results has led to great results on this type of query, and the quality of the query understanding, and resulting answers, is usually outstanding.


However, these are just some of the possible searches we want our bots to run. As we get more and more comfortable with this new interface, we will not want to limit ourselves to one type of query. If you want to be able to get an answer for “Give me a good recipe for sweet potato pie” or “Which Chinese restaurants are open in the area now?”, you need a lot more than a single answer. You need verbosity, you need to be able to refine – which stretches the limits of how we perceive conversational interfaces today.

Part of the problem is that it’s difficult for users to understand the limits of conversational interfaces, especially when bot creators pretend that there are no such limits. Another problem lies in the fact that a natural language interface may simply be a lousy choice for some interaction types, and imposing it on them will only frustrate users.

There is a whole new paradigm of user interaction waiting to be invented, to support non-trivial search and refine through conversation – for all of those many cases where a short exchange and single result will probably not do. We will need to find a way to flip between vocal and visual, manage a seamless thread between devices and screen-based apps, and make digital assistants keep context on a much higher level.

Until then, I guess we’ll continue hoping that we’re feeling lucky.



To Tweet or Not to Tweet (hint: that’s not the question)

I was catching up on my RSS overload the other day, when this side note in a post by Naaman on Social Media Multitaskers caught my attention:

“I find that I now blog thoughts that are too long to fit in a tweet; so feel free to follow my tweets…”

"I am the man. I suffered. I was there." CC by 'Kalense Kid'/Flickr

I’m not too much of a media multitasker myself, so I don’t experience this duality first hand, but I can imagine it: you get an interesting thought or experience, then you think is this major enough to develop into a blog post, for which I’ll go over here, or is it not that heavy / can’t be bothered, in which case I flutter my wings over there. Actually I do experience these, just that in the other case I simply drop it (and excuse me for not considering Facebook status updates an option, that’s stuff for another post…)

This should not have been a dilemma at all, had blogging platforms evolved to accommodate microblogging, which today is somehow seen as the centralized domain of a single commercial company. You really should be able to hop on your publishing platform, write that thought down, regardless of length, and fire it out. No need to figure out which channel to use, and whether the intended readers are indeed following you there. Similarly, your friends/readers should not have to register to your feeds on different platforms but rather consume one only, and rely on a powerful set of rules to filter your stream as they find fit.

posterous-mediumPosterous is a great (and fast growing) example of how easy it can be from the blogger’s perspective. Just post it (or rather, email it) and it will get published as needed (e.g. shortened for twitter). But it does not make it any easier on the consumer, who still needs to decide where to best follow this blogger (does he perhaps write additional blog posts directly on his blog that won’t show up on his twitter? or vice versa?’) and reduces the basic filtering capability that may have existed when different post types were distributed into the different services.

No need to reinvent the wheel here, blogging platforms are abundant, decentralized and perfectly fit to remain our publishing hub, with their developed CMS and the loose but well-defined social networks. What blogging platforms should do – heck, what Automattic should do to evolve, is:

  1. Conversation support the realtime conversational nature of short posts, with the right UI and notifications mechanisms. The “P2” microblogging-optimized theme released almost two years ago was a good start, sadly it still followed the line of thought of “blog or microblog, not both”. To move forward, Automattic need to realize that Twitter is not a personality, it’s a state of mind, hence also P2 can’t be a permanent theme, it should be a contextual theme.
  2. Publishingacquire Posterous. As simple as that. These guys got their fame by understanding the pains of publishing anytime anywhere, they know a thing or two on usability and persuasion, and they have great buzz. The latter is not luxury – a buzzed-up acquisition makes it very clear that this is a major strategy for you, a lot more than if you’d develop the same changes yourself.
  3. Consuming – that’s the tricky part… how do you embed Twitter and WordPress into the same stream, when each consumer has their own desired blend of it. We don’t want to invent a new technology, RSS is here to stay. We do want better ways of filtering our floods using better tagging coupled with more clever feed options. How exactly – I do hope there’s an entire team at Automattic working exactly on that…

    The Opportunity in RSS Overload

    Dare Obasanjo has an interesting post, with a good comments thread, on overflowing feed readers. He’s quoting from a post by Farhad Manjoo on Slate:

    You know that sinking feeling you get when you open your e-mail and discover hundreds of messages you need to respond to…

    Well, actually Dare’s post is from two weeks ago. The reason I got to read it only now is exactly that…

    Yes, I know I don’t really need to ‘respond’ to subscriptions, and the answer should be – unsubscribe, or go on a feeds (or ‘follow’ edges) social diet. But these binary decisions are not always optimal, as I have plenty of feeds I subscribed to after hitting one or two posts I really liked, but that were not on that author’s main subject (if such exists at all). Thus I have to skim through many un-interesting (for me!) posts, many of them somehow always end up discussing twitter. In fact, that’s how most of my feeds look like (including the twitter part).

    We need shades of grey between subscribed and unsubscribed. It would be great to have a feed reader that learns from how you use it. It should be quite clear which posts interest me – ones I took time to read, scroll through, press a link etc. – and which did not. Now train a classifier on that data, preferably per-feed (in addition to a general one), and get some sense of what I’m really looking for.
    Mark All As Read Day - flickr/sidereal

    Now, I don’t need this smart reader to delete the uninteresting ones, let’s not assume too much on its classification accuracy. Just find the right UI to mark the predicted-to-be-interesting items (or even assign them into a special virtual folder). Then I can read these first, and only if/when have time – read the rest.

    I assign this to be my pet project in case I win the lottery next week and go into early retirement. Alternatively, if someone saw this implemented anywhere – let me know!

    Update: a related follow-up post on a new filtering product I started using.

    Building Blocks of Creativity

    Long ago I read an interesting book that tried to teach how to actually engineer creativity. One of the simple methods it proposed was – take an existing device, and strip it of a main characteristic. A TV set without a screen, for example. Not good for anything you say? well, if you’re a real soap opera freak, frustrated that they always run when you’re driving back from work, you could imagine installing this in your car and listening to your TV while driving, rather than watching…

    So let’s take an iPhone and strip it of its… phone. What do you get besides an eye? Siftables. Got this shared from Oren:

    To me, seeing this makes my mind immediately run to how my kids could use it. This is surely a creative Human-Computer Interface, but does that automatically make the applications creative? see the one with the kid injecting characters into a scene played on TV. That’s great, but it’s really limited to the scenarios programmed into this app: the sun can rise, the tractor enters, the dog says hello to the cat – ok, got it. Now what?

    My kids and I actually have a non-Siftables version of this, where we took some game that includes plastic blocks with various images on them, and turned it into a storytelling game. Each player stacks a bunch of these blocks and tries to tell a continuous story by picking a block and fitting it into a story he’s improvising as he goes along. That’s a real creative challenge, and it is so because you have nobody to rely on but your own imagination.

    Another example is the Lego themed sets, non-creativewhere there’s really just one way to assemble them right, and imagination is out of the equation. As an educational tool, standard plain old Lego blocks are far superior. The less rules and options, the more creatively challenged we are, and the more a Siftables app follows that principle, the more educational it may actually become.

    In any case, Siftables are a great idea, and will surely be a great challenge to the creativity of programmers of Siftables apps…

    Clustering Search (yet again)

    Microsoft is rolling an internal test for a search experience upgrade on Live (codenamed Kumo) that clusters search results by aspect. See internal memo and screenshots covered by Kara Swisher.

    As usual, the immediate reaction is – regardless of the actual change, how feasible is it to assume you could make users switch from their Google habit? but let’s try to put that aside and do look at the actual change.

    Search results are grouped into clusters based on the aspects of this particular search query. This idea is far from being new, and was attempted in the past by both Vivisimo (at Clusty.com) and by Ask.com. One difference, though, is that Microsoft pushes the aspects further into the experience, by showing a long page of results with several top results from each aspect (similar to Google’s push with spelling mistakes).

    At least judging by the (possibly engineered) sample results, the clustering works better than previous attempts. Most search engines take the “related queries” twist on this, while Kumo includes related queries as a separate widget:

    kumo-comparisonClusty.com’s  resulting clusters, on the other hand, are far from useful for a serious searcher with enquire/purchase intent.

    At least based on these screenshots, it seems like Microsoft succeeded in distilling interesting aspects better, while maintaining useful labels (e.g. “reviews”). Of course, it’s possible this is all done as a “toy”, limited example, e.g. using some predefined ontology. But together with other efforts, such as the “Cashback” push and the excellent product search (including reviews aggregation and sentiment analysis), it seems like Microsoft may be in the process of  positioning Live as the search engine for ecommerce. Surely a good niche to be in…


    Why Search Innovation is Dead

    We like to think of web search as quite a polished tool. I still find myself amazed at the ease with which difficult questions can be answered just by googling them. Is there really much to go from here?

    "Hasn't Google solved search?" by asmythie/Flickr

    Brynn Evans has a great post on why social search won’t topple Google anytime soon. In it, she shares some yet to be published results on difficulty in forming the query being a major cause for failed searches. That resonated well with some citations I’m collecting right now for my thesis (on concept-based information retrieval). It also reminded me of a post Marissa Mayer of Google wrote some months ago, titled “The Future of Search“.  One of the main items on that future of hers was natural language search, or as she put it:

    This notion brings up yet another way that “modes” of search will change – voice and natural language search. You should be able to talk to a search engine in your voice. You should also be able to ask questions verbally or by typing them in as natural language expressions. You shouldn’t have to break everything down into keywords.

    Mayer gives some examples to questions that were difficult to query or formulate by keywords. But clearly she has the question in her head, so why not just type it in? after all, Google does attempt to answer questions. Mayer (and Brynn too) mentions the lack of context as one reason. Some questions, if phrased naively, refer to the user’s location, activities or other context. It’s a reasonable, though somewhat exaggerated point.  Users aren’t really that naive or lazy, if instead of using search they’d call up a friend, they wouldn’t ask “can you tell me the name of that bird flying out there?”. The same info they would provide verbally, they can also provide to a natural-language search engine, if properly guided.

    The more significant reason in my eyes revolves around habits. Search is usually a means, rather than a goal. So we don’t want to think where and how to search, we just want to type something quickly into that good old search box and fire it away. It’s no wonder that the search engine most bent on sending you away asap, has most loyal users coming back for more. That same engine even has a button, that hardly anyone uses, and supposedly costs them over 100M$ a year in revenues, that sends users away even faster.  So changing this habit is a tough task for any newcomers.

    But these habits go deeper than that. Academic researchers have long studied natural-language search and concept-based search. A combination of effective keyword-based search, together with a more elaborate approach that kicks in when the query is a tough one, could have gained momentum, and some attempts were made for commercial products (most notable Ask, Vivisimo and Powerset). They all failed. Users are so used to the “exact keyword match” paradigm, the total control it provides them with, and its logic (together with its shortcomings) that a switch is nearly impossible, unless Google will drive such a change.

    Until that happens, we’ll have to continue limiting innovations to small tweaks over the authorities…

    Social Search, or Search Socially?

    An interesting paper in Computer-Human-Interaction conference CSC08 described social search in terms of the entire searching process, from consulting with friends on what keywords to use, to sharing the search outcome. The research was based on interviews on Mechanical Turk asking for respondents’ recent search experiences, and concluded with some practical suggestions. After watching the presentation slides, I also exchanged some thoughts with one of the authors, Brynn Evans.

    Gmailizing blogs

    When I first started using gmail, I was shocked: “What? no folders??…” I couldn’t figure out those funny labels, and searching my emails instead seemed a strange idea. Nowadays, when I have to locate an old email, I pray that it’s on gmail and not in my Outlook (even with Vista’s improved search).

    The dilemma between search and browse paradigms runs through many software user interfaces, and was especially emphasized with Google’s focus on search in their products. In some areas, such as finding web sites, the search paradigm has undisputably won and the once-king Yahoo! Directory barely has a stub article in Wikipedia. In others, such as news, search is a rarely used service, and a portal-like browse interface rules.

    But in reality these are complementary paradigms, rather than competing. Browsing is excellent when the data fits a clear and sufficiently granular taxonomy, shared by the author and reader, and unstructured searching fits into all the other cases (and in some cases, like web search, that’s all there is). Oh, and one more difference: search is A LOT easier. Just stuff all the text into strong index machines, and give the user the ubiquitous search box.

    With gmail I wouldn’t think twice before moving an email to the archive, I have no doubt I’ll find it when needed, and all the hassle of managing folders is gone. A blog is no different. You have an author communicating a heap of knowledge to readers, and instead of sorting it for future reference in tags and categories (the complete opposite of “…a clear and sufficiently granular taxonomy…“) they should be gmailized – stuff them in an index and search.

    Ah, you say, just embed a blog search box. Sure, but I have dozens of blogs I want to search in. So use some blogs search aggregator, you suggest. But I don’t want to get results from all the blogs out there, just from those I care about. Well, then, guess you’ll need to build yourself a custom search… or just use Delver. Knowing that in a few years every major search engine will integrate social features, I can carelessly blog about anything my social circle could find useful (say, how to plug an mp3 player to the audio system of an Israeli leasing-level Ford Focus), without bothering about categorizing with the perfect keywords (hint: there aren’t any). In fact, I think I’ll skip categories altogether in this blog, and just use tags for a nifty tag cloud 🙂

    (crossposted on the Delver Blog)