Tag Archives: Microsoft

ChatGPT will kill Google Queries, not Results

It has been a very long time – too long – since Search was disrupted. So it was only appropriate for the hype cycle to reach disproportionate levels with both Microsoft and Google embracing Large Language Models, namely ChatGPT and Bard, into their search experience. It should be noted that “experience” is the key word here, since LLMs have been part of the search backend for quite some time now, only behind the scenes.

By now, we’ve all heard the arguments against these chat-like responses as direct search results. The models are trained to return a single, well articulated piece of text, that pretends to provide an answer, not just a results list. Current models were trained to provide a confident response with lower emphasis on accuracy, which clearly shows when they are put to actual usage. Sure, it’s fun to get a random recipe idea, but getting the wrong information about a medical condition is a totally different story.

So we are likely to see more efforts invested in providing explainability and credibility, and in training the model to project the appropriate confidence vased on sources and domain. The end result may be an actual response for some queries, while for others more of a summary of “what’s out there”, but in all cases there will likely be a reference to the sources, letting the searcher decide whether they trust this reponse, or still need to drill into classic links to validate.

This begs the question then – is this truly a step function, versus what we already have today?

A week ago, I was on the hunt for a restaurant to go to, with a friend visiting from abroad. That friend had a very specific desire – to dine at a fish restaurant, that also serves hummus. Simple enough, isn’t it? Asking Google for “fish restaurants in tel aviv that also serve hummus” quickly showed how very much not so. Google simply failed to understand me. I got plenty of suggestions, some serving fish and some serving hummus, but no guarantee to serving both. I had to painstakingly go one by one and check them out, and most of them had either one or the other. I kept refining that query over and over, as my frustration kept growing.

With the hype still fresh on my mind, I headed over to ChatGPT:

Great. That’s not much help, is it? I asked for a fish restaurant and got a hummus restauarant. For such lack of understanding, I could have stuck with Google. Let’s give it one last try before giving up…

That, right there, was my ‘Aha’ moment.

This result, and the validation it included, was precisely what I was looking for. ChatGPT’s ability to take both context pieces and combine them in a way that reflected back to me what information it is providing, totally made all the difference.

This difference is not obvious. Almost all of the examples in those launch events could do great also with keywords. Google’s Bard announcement post primary examples (beyond the “James Webb” fiasco) were “is the piano or guitar easier to learn, and how much practice does each need?” and “what are the best constellations to look for when stargazing?“. But take any of these as a regular Google queries, and you will get a decent result snippet from a trusted source, as well as a list of very relevant links. At least here you know where the answer is coming from, and can decide whether to trust it or not!


Left: Bard results from announcement post. Right: current Google results for the same query

In fact, Bing’s announcement post included better examples, ones that would work, but would not be optimal for classic search results, such as “My anniversary is coming up in September, help me plan a trip somewhere fun in Europe, leaving from London” (“leaving from London” is not handled well in a search query), or “Will the Ikea Klippan loveseat fit into my 2019 Honda Odyssey?” (plenty of related search results, but not for this exact ikea piece).

The strength of new language models is their ability to understand a much larger context. When Google started applying BERT into their query understanding, that was a significant step in the right direction, moving further away from what their VP Search described as “keyword-ese”, writing queries that are not natural, but that searchers imagine will convey the right meaning. A query he used there was “brazil traveler to usa need a visa” which previously gave results for US travelers to Brazil – perfect example for how looking only at keywords (or “Bag of Words” approach) would fail when not examining the entire context.

I am a veteran search user; I still am cognizant of these constraints when I formulate a search query. That is why I find myself puzzled when my younger daughter enters a free-form question into Google rather than translate it to carefully-selected keywords, as I do. Of course, that should be the natural interface, it just doesn’t work well enough. That is not just a technical limitation – human language is difficult. It is complex, ambiguous, and above all, highly dependent on context.

New language models can enable the query understanding modules in search engines to better understand these more complex intents. First, they will do a much better job at getting keywords context. Then, they will provide reflection; the restaurant example demonstrates how simply reflecting the intent back to the users, enabling them to validate that what they get is truly what they meant, goes a long way to help compensate for mistakes that NLP models will continue to make. And finally, the interactive nature, the ability to reformulate the query as a result of this reflection by simply commenting on what should change, will make the broken experience of today feel more like a natural part of a conversation. All of these will finally get us closer to that natural human interface, as the younger cohort of users so rightfully expects.

Marketing the Cloud

watsonIBM made some news a couple of days ago announcing consumers can now use Watson to find the season’s best gifts. A quick browse through the app, which is actually just a wrapper around a small dedicated website, shows nothing of the ordinary – Apple Watch, Televisions, Star Wars, Headphones, Legos… not much supercomputing needed. No wonder coverage turned sour after an initial hype, so what was IBM thinking?

tensorflowRewind the buzz machines one week back. Google stunned tech media, announcing it is open sourcing its core AI framework, TensorFlow. The splashes were high: “massive potential“, “Machine Learning breakthrough“, “game changer“… but after a few days, the critics were out, Quorans talking about the library’s slowness, and even Google-fanboy researchers wondering – what exactly is TensorFlow useful for?

Nevertheless, within 3 days, Microsoft quickly announced its own open source Machine Learning toolkit, DMTK. The Register was quick to mock the move, saying “Google released some of its code last week. Redmond’s (co-incidental?) response is pretty basic: there’s a framework, and two algorithms”…

So what is the nature of all these recent PR-like moves?

marketing-cloud

There is one high-profit business shared by all of these companies: Cloud Computing. Amazon leads the pack in revenue, and uses the cash flow from cloud business to offset losses on its aggressive ecommerce pricing, but also Microsoft and Google are assumed to come next with growing cloud business. Google even goes as far as predicting cloud revenue to surpass ads revenue in five years. It is the gold rush era for the industry.

But first, companies such as Microsoft, Google and IBM will need to convince corporates to hand them their business, rather than to Amazon. Hence they have to create as much “smart” buzz for themselves, so that executives in these organization, already fatigued by the big-data buzzwords, will say: “we must work with them! look, they know their way with all this machine-learning-big-data-artifical-intelligence stuff!!”

So the next time you hear some uber-smart announcement from one of these companies that feels like too much hot air, don’t look for too much strategy; instead, just look up to the cloud.

Microsoft Israel ReCon 2015 (or: got to start blogging more often…)

Yes, two consecutive posts on the same annual event are not a good sign to my virtual activity level… point taken.

MSILSo 2 weeks ago, Microsoft Israel held its second ReCon conference on Recommendations and Personalization, turning its fine 2014 start into a tradition worth waiting for. This time it was more condensed than last year (good move!) and just as interesting. So here are three highlights I found worth reporting about:

Uri Barash of the hosting team gave the first keynote on Cortana integration in Windows 10, talking about the challenges and principles used. Microsoft places a high empasis on the user’s trust, hence Cortana does not use any interests that are not explicitly written in Cortana’s notebook, validated by the user. If indeed correct, that’s somewhat surprising, as it limits the recommendation quality and moreover – the discovery experience for the user, picking up potential interests from the user’s activity. I’d still presume that all these implicit interests are probably used behind the scenes, to optimize the content from explicit interests.

ibm_logoIBM Haifa Research Labs have been doing work for some years now on enterprise social networks, and mining connections and knowledge from such networks. In ReCon this year, Roy Levin presented a paper to be published in SIGIR’15, titled “Islands in the Stream: A Study of Item Recommendation within an Enterprise Social Stream“. In the paper, they discuss a feature for a personalized newsfeed included in IBM’s enterprise social network “IBM Connections”, and provide some background and the personalized ranking logic for the feed items.

They then move on to describe a survey they have made among users of the product, to analyze their opinions on specific items recommended for them in their newsfeed, similar to Facebook’s newsfeed surveys. Through these surveys, the IBM researchers attempted to identify correlations between various feed item factors, such as post and author popularity, post personalization score, how surprising an item may be to a user and how likely a user is to want such serevdipity, etc. The actual findings are in the paper, but what may actually be even more interesting is the deep dissection in the paper of the internal workings of the ranking model.

Outbrain-logoAnother interesting talk was by Roy Sasson, Chief Data Scientist at Outbrain. Roy delivered a fascinating talk about learning from lack of signals. He began with an outline of general measurement pitfalls, demonstrating them on Outbrain widgets when analyzing low numbers of of clicks on recommended items. Was the widget visible to the user? where was it positioned in the page (areas of blindness)? what items were next to the analyzed item? were they clicked? and so on.

Roy then proceeded to talk about what we may actually be able to learn from lack of sharing to social networks. We all know that content that gets shared a lot on social networks is considered viral, driving a lot of discussion and engagement. But what about content that gets practically no sharing at all? and more precisely, what kind of content gets a lot of views, but no sharing? Well, if you hadn’t guessed already, that will likely be content users are very interested to see, but would not admit to it, namely provocative and adult material. So in a way, leveraging this reverse correlation helped Outbrain automatically identify porn and other sensitive material. This was then not used to filter all of this content out – after all, users do want to view it… but it was used to make sure that the recommendation strip includes only 1-2 such items so they don’t take over the widget, making it seem like this is all Outbrain has to offer. Smart use of data indeed.

Microsoft Israel ReCon 2014

Microsoft Israel R&D Center held their first Recommendations Technology conference today, ReCon. With an interesting agenda and a location that’s just across the street from my office, I could not skip this one… here are some impressions from talks I found worth mentioning.

The first keynote speaker was Joseph Sirosh, who leads the Cloud Machine Learning team at Microsoft, recently joining from Amazon. Sirosh may have aimed low, not knowing what his audience will be like, but as a keynote this was quite a disappointing talk, full of simplistic statements and buzzwords. I guess he lost me when he stated quite decisively that the big difference about putting your service on the cloud is that it means it will get better the more people use it. Yeah.

Still, there were also some interesting observations he pointed out, worth mentioning:

  • If you’re running a personalization service, benchmarking against most popular items (i.e. Top sellers for commerce) is the best non-personalized option. Might sound trivial, but when coming from an 8-year Amazon VP, that’s a good validation
  • “You get what you measure”: what you choose to measure is what you’re optimizing, make sure it’s indeed your weakest links and the parts you want to improve
  • Improvement depends on being able to run a large number of experiments, especially when you’re in a good position already (the higher you are, the lower your gains, and the more experiments you’ll need to run to keep gaining)
  • When running these large numbers of experiments, good collaboration and knowledge sharing becomes critical, so different people don’t end up running the same experiments without knowing of each other’s past results

Elad Yom-Tov from Microsoft Research described work his team did on enhancing Collaborative Filtering using browse logs. They experimented with adding user browser logs (visited urls) and search queries to the CF matrix in various ways to help bootstrapping users with little data and to better identify short-term (recent) intent for these users.

An interesting observation they reached was that using the raw search queries as matrix columns worked better than trying to generalize or categorize them, although intuitively one would expect this would reduce the sparsity of such otherwise very long-tail attributes. It seems that the potential gain in reducing sparsity is offset by the loss of specificity and granularity of the original queries.

unique

Another related talk which outlined an interesting way to augment CF was by Haggai Roitman of IBM Research. Haggai suggested the feature of “user uniqueness” –  to what extent the user follows the crowd or deliberately looks for the esoteric choices, as a valuable signal in recommendations. This uniqueness would then determine whether to serve the user with results that are primarily popularity-based (e.g. CF) or personalized (e.g. content-based), or a mix of the two.

The second keynote was by Ronny Lempel of Yahoo! Labs in Haifa. Ronny talked about multi-user devices, in particular smart TVs, and how recommendations should take into account the user that is currently in front of the device (although this information is not readily available). The heuristic his team used was that the audience usually doesn’t change in consecutive programs watched, and so using the last program as context to recommending the next program will help model that unknown audience.

Their results indeed showed a significant improvement in recommendations effectiveness when using this context. Another interesting observation was that using a random item from the history, rather than the last one, actually made the recommendations perform worse than no context at all. That’s an interesting result, as it validates the assumption that approximating the right audience is valuable, and if you make recommendations to the parent watching in the evening based on the children’s watched programs in the afternoon, you are likely to make it worse than no such context at all.

Cortana

The final presentation was by Microsoft’s Hadas Bitran, who presented and demonstrated Windows Phone’s Cortana. Microsoft go out of their way to describe Cortana as friendly and non-creepy, and yet the introductory video from Microsoft Hadas presented somehow managed to include a scary robot (from Halo, I presume), dramatic music, and Cortana saying “Now learning about you”. Yep, not creepy at all.

Hadas did present Cortana’s context-keeping session, which looks pretty cool as questions she asked related to previous questions and answers, were followed through nicely by Cortana (all in a controlled demo, of course). Interestingly, this even seemed to work too well, as after getting Cortana’s list of suggested restaurants Hadas asked Cortana to schedule a spec review, and Cortana insisted again and again to book a table at the restaurant instead… nevertheless, I can say the demo actually made the option of buying a Windows Phone pass through my mind, so it does do the job.

All in all, it was an interesting and well-organized conference, with a good mix of academia and industry, a good match to IBM’s workshops. Let’s have many more of these!

Yahoo Gives Up on Social Search

In an interview that strangely made headlines only in Indian tech blogs, Yahoo Research Labs’ Chief Prabhakar Raghavan declared that Yahoo will not replace its search with Bing. OK, the Yahoo-Microsoft deal is not really off, but the deal details turn out to imply that Yahoo will only use Microsoft search technology as the backend, and keep building its own smart front-end to it that will make use of Yahoo’s content assets. Raghavan says:

“Yahoo will not use Bing. Bing is a branded search engine that Microsoft is building on top of its search back-end and we will build our own search front-end on that same Microsoft back-end. It (using Bing) is not the case, at least as envisioned at the moment”

This actually makes perfect sense. Stop spending tons of resources on crawling and ranking in a futile war with Google, and focus on building the user experience over it, leveraging Yahoo’s advantage – content. Raghavan mentions scenarios that sound a lot like Yahoo shortcuts (that’s really old news) as one example of how to deliver a more complete experience over commodity search results.

The article then goes on to discuss the second focus for Yahoo, social applications, and mentions Microsoft’s tie-up with Facebook for access to social graph. Raghavan is quoted as saying:

“Social networks are not just a place to hang out, but to get things done. It predates the web.. I’m not sure where the sweet spot is, we’re still doing research on it”

Also makes perfect sense. With Google as a common enemy, and Microsoft a Facebook partner, Yahoo may be better positioned to deliver social applications that leverage the de-facto standard of Facebook graph, rather than push its own failed networks.

So why is my post title suggesting what it’s suggesting??


There is one catch in sub-contracting your search results: you are now limited with what you can do in search ranking. The best you can do is re-rank the set of results Microsoft’s technology supplied you with before presenting it to the user. As I’ve pointed out in the past when talking about Delver’s technology, social (graph-based) search is a game that cannot be played by reranking, since it’s a classic long tail problem. So when you can’t interfere with how search results are ranked, you also can’t deliver true social search, as Google recently did. One less social application Yahoo can build…

Google Nails Down Social Search

Google’s Social Search is doing the walk, all the rest are just doing the talk. As soon as I activated the Social Search experiment, my next search yielded a social result. No setting up, showing how I am connected to that result (including friends of friends), showing as part of the standard web results…

google-social-searchContrast this with Microsoft’s poor attempt at “social search” by indexing tweets and status messages and showing them regardless of the actual searcher (example search, you’ve got to be on “United States” locale on bing to see it).

Then also contrast it with Facebook’s announcement back in August of its implementation of searching within friends’ posts – a less grandiose announcement that yet delivered far more social experience than Bing’s. Nevertheless, it’s a very limited experience and far from being a true information source for any serious search need.

So how does Google overcome the main obstaclecollecting your connections?

Google relies on its own sources and on open sources it can obtain by crawling the social graph. That is the true reason why Facebook is not part of Google’s graph (no XFN/FOAF marking on Facebook’s public pages). Google may be counting on Facebook’s inevitable opening up, and with Gmail’s rising popularity it becomes a reasonable alternative even for Facebook users like me.

Sadly, all this great news gave zero credit to Delver, where it all happened first

Google Labs is now Google

Quick, name this search engine!

public-google-labs

No, not Kumo. That’s Google’s recent launch, trying to compete with Twitter search (“Recent results”), to preempt Microsoft (clustering result types), to show a different, though quite ugly UI metaphor (“wonder wheel”), and generally to roll out a whole bunch of features that should have been Google Labs features before making (or not) their way into a public product. So what’s next? buttons next to search results moving them up or down with no opt-out?? Ah, wait, that waste of real estate is already there.

Flash Gordon Gets the Drop on Arch-Enemy Ming the Mericiless - Flickr/pupleslog

Someone is panicking. OPEN FIRE! ALL WEAPONS!!! DISPATCH WAR ROCKET AJAX!!! The same spirit that brought us the failure of knols, is bringing us yet further unnecessary novelty, but this time it’s a cacophony of features, each deserving a long Google Labs quarantine by itself.

I noticed that much of my recent blog posts have to do with Google criticism :-). I wrestle with that, there really ought to be more interesting stuff to blog about in the IR world, and there is also great stuff coming from Google (can you imagine the fantastic similar images feature is still in labs? can Google please apply this to the ridiculously useless “similar pages” link in main web search results??), but I truly think we see a trend. Google is dropping the ball, losing the clear and spotless logic we have seen in the past, and the sensible slow graduation of disruptive features from Google Labs. Sadly, though, it’s not clear if anyone is there, ready to pick that ball…

Clustering Search (yet again)

Microsoft is rolling an internal test for a search experience upgrade on Live (codenamed Kumo) that clusters search results by aspect. See internal memo and screenshots covered by Kara Swisher.

As usual, the immediate reaction is – regardless of the actual change, how feasible is it to assume you could make users switch from their Google habit? but let’s try to put that aside and do look at the actual change.

Search results are grouped into clusters based on the aspects of this particular search query. This idea is far from being new, and was attempted in the past by both Vivisimo (at Clusty.com) and by Ask.com. One difference, though, is that Microsoft pushes the aspects further into the experience, by showing a long page of results with several top results from each aspect (similar to Google’s push with spelling mistakes).

At least judging by the (possibly engineered) sample results, the clustering works better than previous attempts. Most search engines take the “related queries” twist on this, while Kumo includes related queries as a separate widget:

kumo-comparisonClusty.com’s  resulting clusters, on the other hand, are far from useful for a serious searcher with enquire/purchase intent.

At least based on these screenshots, it seems like Microsoft succeeded in distilling interesting aspects better, while maintaining useful labels (e.g. “reviews”). Of course, it’s possible this is all done as a “toy”, limited example, e.g. using some predefined ontology. But together with other efforts, such as the “Cashback” push and the excellent product search (including reviews aggregation and sentiment analysis), it seems like Microsoft may be in the process of  positioning Live as the search engine for ecommerce. Surely a good niche to be in…

live-products1