Feeling Lucky Is the Future of Search

If you visit the Google homepage on your desktop, you’ll see a rare, prehistoric specimen – one that most Google users don’t see the point of: the “I’m Feeling Lucky” button.

Google has already removed it from most of its interfaces, and even here it only serves as a teaser for various Google nitwit projects. And yet the way things are going, the “Feeling Lucky” ghost may just come back to life – and with a vengeance.


In the early years, the “I’m Feeling Lucky” button was Google’s way of boldly stating “Our results are so great, you can just skip the result lists and head straight to destination #1”. It was a nice, humorous touch, but one that never really caught on as users’ needs grew more complex and less obvious. In fact, it lost Google quite a lot of money, since skipping the result list also meant users saw fewer and fewer sponsored results – Google’s main income source. But usability testing showed that users really liked seeing the button, so Google kept it there for a while.

But there’s another interface rising up that prides itself on returning the first search result without showing you the list. Did you already guess what it is?


Almost every demo of a new personal assistant product will include questions being answered by the bot tapping into a search engine. The demos will make sure to use simple single-answer cases, like “Who is the governor of California?” That’s extremely neat, and was regarded as science fiction not so many decades ago. Amazing work on query parsing and entity extraction from search results has led to great results on this type of query, and the quality of the query understanding, and resulting answers, is usually outstanding.


However, these are just some of the possible searches we want our bots to run. As we get more and more comfortable with this new interface, we will not want to limit ourselves to one type of query. If you want to be able to get an answer for “Give me a good recipe for sweet potato pie” or “Which Chinese restaurants are open in the area now?”, you need a lot more than a single answer. You need verbosity, you need to be able to refine – which stretches the limits of how we perceive conversational interfaces today.

Part of the problem is that it’s difficult for users to understand the limits of conversational interfaces, especially when bot creators pretend that there are no such limits. Another problem lies in the fact that a natural language interface may simply be a lousy choice for some interaction types, and imposing it on them will only frustrate users.

There is a whole new paradigm of user interaction waiting to be invented, to support non-trivial search and refine through conversation – for all of those many cases where a short exchange and single result will probably not do. We will need to find a way to flip between vocal and visual, manage a seamless thread between devices and screen-based apps, and make digital assistants keep context on a much higher level.

Until then, I guess we’ll continue hoping that we’re feeling lucky.



Learning to Play

Ever since I took my first course in Artificial Intelligence, I have been fascinated by the idea of AI in its classical meaning – teaching machines to perform tasks deemed by us humans as requiring intelligence.

Recently, I gave a talk at my company on some of the intriguing instances of one of these tasks – learning to play (and win!) games. I often found the human stories behind the scenes even more fascinating than the algorithms themselves, and that was my focus in this talk. It was really fun both to assemble as well as deliver, so I wanted to capture these stories in this blog post, to accompany the embedded slides below.


So let’s get started!

a humble start

Game playing is a fantastic AI task, one that researchers were always excited about. Just like a toddler being taught to swing a baseball bat by an excited parent, the algorithm gets clear rules, a measurable goal and training input. But above all, testing the result involves the fun act of playing against the opponent you yourself have created, just like a proud parent. What a great way to do AI research!

As we go way back in the AI time machine, the first known implementation of an AI game was in 1950. Josef Kates was a young Jewish Austrian engineer, whose family fled the Nazis’ rise to power and ended up in Canada. Kates worked on radar and vacuum tubes design at a company named Rogers Majestic, and later developed his own patented tube, which he called the Additron. While waiting for the patent to be registered, he wanted to demonstrate the power of his invention in a local technology fair, so he built a machine that could play Tic-Tac-Toe, calling it “Bertie the Brain”.

Comedian Danny Kay pleased after "beating" Bertie the Brain during the fair

Comedian Danny Kaye pleased after “beating” Bertie the Brain during the fair

“Bertie the Brain” was a huge success at the fair. Kates made sure to adjust its level of difficulty to allow players to occasionally beat it, and visitors lined up to play. Nevertheless, at the end of the fair it was dismantled and forgotten. Unfortunately for Kates, the Additron took a very long time to go through patenting, and by the time it was approved technology had already moved on toward transistors.

minimaxThe algorithms pioneered and used in those early days were based on the Minimax method – constructing a tree of all possible moves by the player and opponent, and evaluating the proximity to a winning position. In each move, the algorithm would assume best play with the computer playing the move with MAXimal value and the opponent playing its own maximum, which is the computer’s MINimal value. Thus, the algorithm could calculate into the future as much as time allowed.

With only 765 unique board positions in Tic-Tac-Toe, the game was small enough that all positions and moves could be calculated in advance, making Bertie unbeatable. AI researchers call this situation a “Solved” game. In fact, perfect game play will always end in a draw, and if you watched the 1983 movie “War-Games” with Matthew Broderick, you’ll recall how this fact saved the world from nuclear annihilation…

advance to world-class wins

So if Tic-Tac-Toe is too simple, how about a more complex game such as checkers?

Checkers has, well, slightly more board positions: at 5 x 1020 board positions, it was a much more challenging AI task. The best-known checkers program, even if not the first, was the one written by Arthur Samuel at IBM. Samuel’s checkers was considered a real classic, and for several decades it was considered the best that can be achieved. It still used Minimax, but expanded its repository of board positions from actual games played, often against itself, thus becoming a true learning algorithm. However, it never got to the level of beating master human players.


In 1989, a group of researchers – led by Jonathan Schaeffer from the University of Alberta – set out to use advances in computing and break that glass ceiling with a new program called Chinook. I had the privilege of attending a fascinating talk by Schaeffer at the Technion 10 years ago, and the blog post I wrote subsequently summarizes the full story. That story has fascinating twists and touching human tributes in it, but it ends with machines being the clear winners – and with AI researchers declaring the game of checkers as solved as well.

The obvious next challenge in our journey would be what’s considered the ultimate game of intelligence – chess. Using the same board as checkers, but with more complex moves, chess has approximately 10120 board positions – that’s about the number of checkers positions, squared. A famous chess-playing machine was The Turk, designed and constructed in Austria by Wolfgang von Kempelen as early as 1770. The Turk was a wonder of its age, beating experienced chess players and even Napoleon Bonaparte. It was a hoax, of course, cleverly hiding a human sitting inside it, but the huge interest it created was a symbol of the great intelligence attributed to playing the game.

kasparovThe huge search space in which Minimax had to be applied for chess made early programs extremely weak against humans. Even with the introduction of minimax tree-pruning methods such as Alpha-Beta pruning, it seemed like no algorithmic tuning would produce a breakthrough. As the decades passed, though, more powerful computers enabled faster computations and larger space to hold billions of possible board positions. This culminated in the famous 1996 duel between IBM’s Deep Blue chess-playing computer – already capable of evaluating 200 million positions per second – and the world champion at the time, Garry Kasparov. Despite losing two games to the supercomputer, Kasparov won the tournament easily, 4-2. IBM went on to further improve Deep Blue and invited Kasparov to a re-match the following year. Kasparov won the first game easily, and was so confident as a result that he lost the next game, a loss he blamed on cheating by IBM. The match ended 3.5-2.5 to Deep Blue, a sensational first win for a machine over a presiding world champion.

from brute force to TRUE learning

The shared practice that connected all the work we saw so far – from Bertie the Brain to Deep Blue – was to feed huge amounts of knowledge to the software, so that it could out-do the human player by sheer computing power and board positions stored in its vast memory. This enabled algorithms such as Minimax to process enormous numbers of positions, apply the human-defined heuristics to them and find the winning moves.

Let’s recall the toddler from the start of our journey. Is this how humans learn? Would we truly consider this artificial intelligence?

If we want to emulate true intelligence, what we’d really like to build are algorithms that learn by themselves. They will watch examples and learn from them; they will build their own heuristics; they will infer the domain knowledge rather than have it fed into them.

In 2014, a small London start up named DeepMind Technologies, founded less than three years earlier, was acquired by Google for the staggering sum of $600 million before it had released even one product to the market. In fact, reporters struggled to explain what DeepMind was doing at all.

deepmind-logoThe hints at what attracted Google to DeepMind lie in a paper its team published in December 2013. The paper, presented in NIPS 2013, was titled “Playing Atari with Deep Reinforcement Learning“. It was about playing games, but unlike ever before. This was about a generic system, learning to play games without being given any knowledge, nothing but a screen and the score-keeping part in it. You could equate it to a human who had never played Pac-Man, taking the controls and just hitting them in all directions, watching the score and gradually figuring out how to play it like a pro and then doing the same for many other games, all using the same method. Sounds human? This was the technology Google was after.

Watching DeepMind play Atari Breakout (seen in this video) is like magic. The algorithm starts out moving randomly, barely hitting the ball once every many misses. After an hour of training, it starts playing at an impressive pro level. Then it even learns the classic trick that any Breakout player eventually masters – tunneling the ball to the top so that it hits bricks off with little effort. The beauty of it all was that the exact same system mastered several other games with no custom optimizations – only the screen raw input and an indication of where the score is, nothing else. This was no Minimax running, no feeding of grandmaster moves books or human-crafted heuristic functions. It was generic deep-learning neural networks, using reinforcement learning that would look at a series of moves and their score outcome, and uncover the winning patterns all by itself. Pure magic.

AI Building games

For the last part of the talk, I deviated to a related topic. For this part, I was walking through a wonderful series of blog posts I stumbled upon called “Machine Learning is Fun!”, where the author, Adam Geitgey, walks through basic concepts in Machine Learning. In part two, he describes how Recurrent Neural Networks can be trained to learn and generate patterns. The simplest example we all know and appreciate (or sometimes not…) is the predictive text feature of mobile keyboards, where the system attempts to predict what word we are trying to type – the cause of so many great texting gaffes.

Moving to more elaborate examples, Geitgey fed an RNN implementation with a Hemingway book (“The Sun Also Rises”), and trained it recurrently on the book’s text, then having it spit out texts of its own that would match the book. It starts out with incomprehensible strings of text, but gradually takes the form of words and sentences, to the point that the sentences almost make sense and retain Hemingway’s typically curt dialogue style.

Geitgey then takes this system and applies it to none other than the Super Mario Maker. This is a version of Super Mario that allows players to build levels of their own. He transforms game levels into text streams and feeds these into the learning system. Again here, at first the system spits out nonsense. But then it gradually learns the basic rules and eventually generates actual playable levels. I’m no expert on Super Mario so I couldn’t tell, but I showed it to my son and he said it’s a great level that he would be happy to play. That’s intelligent enough for me!



So Long, and Thanks for All the Links


Prismatic is shutting down its app.

I’ve been fascinated by algorithmic approaches to information overload for quite some time now. It seemed like one of those places where the Web changed everything, and now we need technology to kick in and make our lives so much easier.

Prismatic_logo,_June_2014Prismatic was one of the more promising attempts to that I’ve seen, and I’ve been a user ever since its launch back in 2012. Every time I opened it, it never failed to find me real gems, especially given the tiny setup it required when I first signed up. Prismatic included explicit feedback controls, but it seemed to excel in using my implicit feedback, which is not trivial at all for a mobile product.

flipboard-logo-iconFlipboard is likely the best alternative out there right now, and its excellent onboarding experience helped me get started quickly with a detailed list of topics to follow. With reasonable ad-powered revenue, which Prismatic seemed to shun for whatever reason, it is also less likely to shut down anytime soon. Prismatic still does a much better job than Flipboard in surfacing high-quality, long-tail, non-mainstream sources; let’s hope Flipboard continues improving to get there.

It seems, though, that news personalization is not such a strong selling point. Recently, Apple moved from a pure personalized play for its Apple News app to also add curated top stories, as its view counts disappointed publishers. In my own experience, even the supposed personalized feed was mostly made up of 3-4 mainstream sources anyway. Let’s hope that this is not where information overload is leading us back to. Democratizing news and getting a balanced and diverse range of opinions and sources is a huge social step forward, that the Web and Social Media have given us. Let’s not go backwards.

Marketing the Cloud

watsonIBM made some news a couple of days ago announcing consumers can now use Watson to find the season’s best gifts. A quick browse through the app, which is actually just a wrapper around a small dedicated website, shows nothing of the ordinary – Apple Watch, Televisions, Star Wars, Headphones, Legos… not much supercomputing needed. No wonder coverage turned sour after an initial hype, so what was IBM thinking?

tensorflowRewind the buzz machines one week back. Google stunned tech media, announcing it is open sourcing its core AI framework, TensorFlow. The splashes were high: “massive potential“, “Machine Learning breakthrough“, “game changer“… but after a few days, the critics were out, Quorans talking about the library’s slowness, and even Google-fanboy researchers wondering – what exactly is TensorFlow useful for?

Nevertheless, within 3 days, Microsoft quickly announced its own open source Machine Learning toolkit, DMTK. The Register was quick to mock the move, saying “Google released some of its code last week. Redmond’s (co-incidental?) response is pretty basic: there’s a framework, and two algorithms”…

So what is the nature of all these recent PR-like moves?


There is one high-profit business shared by all of these companies: Cloud Computing. Amazon leads the pack in revenue, and uses the cash flow from cloud business to offset losses on its aggressive ecommerce pricing, but also Microsoft and Google are assumed to come next with growing cloud business. Google even goes as far as predicting cloud revenue to surpass ads revenue in five years. It is the gold rush era for the industry.

But first, companies such as Microsoft, Google and IBM will need to convince corporates to hand them their business, rather than to Amazon. Hence they have to create as much “smart” buzz for themselves, so that executives in these organization, already fatigued by the big-data buzzwords, will say: “we must work with them! look, they know their way with all this machine-learning-big-data-artifical-intelligence stuff!!”

So the next time you hear some uber-smart announcement from one of these companies that feels like too much hot air, don’t look for too much strategy; instead, just look up to the cloud.

Thoughts on Plus – Revisited

plusTwo weeks ago, Google decided to decouple Google+ from the rest of the Google products, and to not require a G+ login when using those other products (e.g. YouTube), in effect starting to gradually relieve it from its misery. Mashable published excellent analysis on the entire history of the project, and of the hubris demonstrated by Vic Gundotra, the Google exec who led it.

Bradley Horowitz, who conceived Google+ along with Gundotra and is now the one to oversee the transition, laid out the official Google story in a G+ blog post. He talked of the double mission Google assigned to the project – become a unifying platform, as well as a product on its own. A heavy burden to carry, as in many cases these two missions will surely conflict each other and mess up the user experience, as they did. Horowitz also explains what G+ should have focused on, and now will: “…helping millions of users around the world connect around the interest they love…”

Well, unfortunately Horowitz seems to not be a regular reader of Alteregozi 🙂 Had he read this post, exactly 4 years ago right here, perhaps G+ would have had more of a differentiation, and a chance.

Microsoft Israel ReCon 2015 (or: got to start blogging more often…)

Yes, two consecutive posts on the same annual event are not a good sign to my virtual activity level… point taken.

MSILSo 2 weeks ago, Microsoft Israel held its second ReCon conference on Recommendations and Personalization, turning its fine 2014 start into a tradition worth waiting for. This time it was more condensed than last year (good move!) and just as interesting. So here are three highlights I found worth reporting about:

Uri Barash of the hosting team gave the first keynote on Cortana integration in Windows 10, talking about the challenges and principles used. Microsoft places a high empasis on the user’s trust, hence Cortana does not use any interests that are not explicitly written in Cortana’s notebook, validated by the user. If indeed correct, that’s somewhat surprising, as it limits the recommendation quality and moreover – the discovery experience for the user, picking up potential interests from the user’s activity. I’d still presume that all these implicit interests are probably used behind the scenes, to optimize the content from explicit interests.

ibm_logoIBM Haifa Research Labs have been doing work for some years now on enterprise social networks, and mining connections and knowledge from such networks. In ReCon this year, Roy Levin presented a paper to be published in SIGIR’15, titled “Islands in the Stream: A Study of Item Recommendation within an Enterprise Social Stream“. In the paper, they discuss a feature for a personalized newsfeed included in IBM’s enterprise social network “IBM Connections”, and provide some background and the personalized ranking logic for the feed items.

They then move on to describe a survey they have made among users of the product, to analyze their opinions on specific items recommended for them in their newsfeed, similar to Facebook’s newsfeed surveys. Through these surveys, the IBM researchers attempted to identify correlations between various feed item factors, such as post and author popularity, post personalization score, how surprising an item may be to a user and how likely a user is to want such serevdipity, etc. The actual findings are in the paper, but what may actually be even more interesting is the deep dissection in the paper of the internal workings of the ranking model.

Outbrain-logoAnother interesting talk was by Roy Sasson, Chief Data Scientist at Outbrain. Roy delivered a fascinating talk about learning from lack of signals. He began with an outline of general measurement pitfalls, demonstrating them on Outbrain widgets when analyzing low numbers of of clicks on recommended items. Was the widget visible to the user? where was it positioned in the page (areas of blindness)? what items were next to the analyzed item? were they clicked? and so on.

Roy then proceeded to talk about what we may actually be able to learn from lack of sharing to social networks. We all know that content that gets shared a lot on social networks is considered viral, driving a lot of discussion and engagement. But what about content that gets practically no sharing at all? and more precisely, what kind of content gets a lot of views, but no sharing? Well, if you hadn’t guessed already, that will likely be content users are very interested to see, but would not admit to it, namely provocative and adult material. So in a way, leveraging this reverse correlation helped Outbrain automatically identify porn and other sensitive material. This was then not used to filter all of this content out – after all, users do want to view it… but it was used to make sure that the recommendation strip includes only 1-2 such items so they don’t take over the widget, making it seem like this is all Outbrain has to offer. Smart use of data indeed.

Microsoft Israel ReCon 2014

Microsoft Israel R&D Center held their first Recommendations Technology conference today, ReCon. With an interesting agenda and a location that’s just across the street from my office, I could not skip this one… here are some impressions from talks I found worth mentioning.

The first keynote speaker was Joseph Sirosh, who leads the Cloud Machine Learning team at Microsoft, recently joining from Amazon. Sirosh may have aimed low, not knowing what his audience will be like, but as a keynote this was quite a disappointing talk, full of simplistic statements and buzzwords. I guess he lost me when he stated quite decisively that the big difference about putting your service on the cloud is that it means it will get better the more people use it. Yeah.

Still, there were also some interesting observations he pointed out, worth mentioning:

  • If you’re running a personalization service, benchmarking against most popular items (i.e. Top sellers for commerce) is the best non-personalized option. Might sound trivial, but when coming from an 8-year Amazon VP, that’s a good validation
  • “You get what you measure”: what you choose to measure is what you’re optimizing, make sure it’s indeed your weakest links and the parts you want to improve
  • Improvement depends on being able to run a large number of experiments, especially when you’re in a good position already (the higher you are, the lower your gains, and the more experiments you’ll need to run to keep gaining)
  • When running these large numbers of experiments, good collaboration and knowledge sharing becomes critical, so different people don’t end up running the same experiments without knowing of each other’s past results

Elad Yom-Tov from Microsoft Research described work his team did on enhancing Collaborative Filtering using browse logs. They experimented with adding user browser logs (visited urls) and search queries to the CF matrix in various ways to help bootstrapping users with little data and to better identify short-term (recent) intent for these users.

An interesting observation they reached was that using the raw search queries as matrix columns worked better than trying to generalize or categorize them, although intuitively one would expect this would reduce the sparsity of such otherwise very long-tail attributes. It seems that the potential gain in reducing sparsity is offset by the loss of specificity and granularity of the original queries.


Another related talk which outlined an interesting way to augment CF was by Haggai Roitman of IBM Research. Haggai suggested the feature of “user uniqueness” –  to what extent the user follows the crowd or deliberately looks for the esoteric choices, as a valuable signal in recommendations. This uniqueness would then determine whether to serve the user with results that are primarily popularity-based (e.g. CF) or personalized (e.g. content-based), or a mix of the two.

The second keynote was by Ronny Lempel of Yahoo! Labs in Haifa. Ronny talked about multi-user devices, in particular smart TVs, and how recommendations should take into account the user that is currently in front of the device (although this information is not readily available). The heuristic his team used was that the audience usually doesn’t change in consecutive programs watched, and so using the last program as context to recommending the next program will help model that unknown audience.

Their results indeed showed a significant improvement in recommendations effectiveness when using this context. Another interesting observation was that using a random item from the history, rather than the last one, actually made the recommendations perform worse than no context at all. That’s an interesting result, as it validates the assumption that approximating the right audience is valuable, and if you make recommendations to the parent watching in the evening based on the children’s watched programs in the afternoon, you are likely to make it worse than no such context at all.


The final presentation was by Microsoft’s Hadas Bitran, who presented and demonstrated Windows Phone’s Cortana. Microsoft go out of their way to describe Cortana as friendly and non-creepy, and yet the introductory video from Microsoft Hadas presented somehow managed to include a scary robot (from Halo, I presume), dramatic music, and Cortana saying “Now learning about you”. Yep, not creepy at all.

Hadas did present Cortana’s context-keeping session, which looks pretty cool as questions she asked related to previous questions and answers, were followed through nicely by Cortana (all in a controlled demo, of course). Interestingly, this even seemed to work too well, as after getting Cortana’s list of suggested restaurants Hadas asked Cortana to schedule a spec review, and Cortana insisted again and again to book a table at the restaurant instead… nevertheless, I can say the demo actually made the option of buying a Windows Phone pass through my mind, so it does do the job.

All in all, it was an interesting and well-organized conference, with a good mix of academia and industry, a good match to IBM’s workshops. Let’s have many more of these!