Learning to Play

Ever since I took my first course in Artificial Intelligence, I was fascinated by the idea of AI in its classical meaning – teaching machines to perform tasks deemed by us humans as requiring intelligence to do.

Last week I gave a talk at my company on some of the intriguing instances of one of these tasks – learning to play (and win!) games. I often found the human stories behind the scenes even more fascinating than the algorithms themselves, and in this talk that was my focus. It was really fun both to assemble as well as deliver, so I wanted to capture these stories in this blog post, to accompany the embedded slides below.

 

So let’s get started!

a humble start

Game playing is a fantastic AI task, one that researchers were always excited about. Just like a toddler being taught to swing a baseball bat by an excited parent, the algorithm gets clear rules, a measurable goal, and training input. But above all, testing the result involves the fun act of playing against the opponent that you have created yourself, just like a proud parent. What a great way to do AI research!

As we go way back in the AI time machine, the first known implementation of an AI game was as back as 1950. Josef Kates was a young Jewish Austrian engineer, whose family fled the Nazis rise to power and ended up in Canada. Kates worked on radar and vacuum tubes design at a company named “Rogers Majestic”, and later developed his own patented tube which he called the “Additron”. While waiting for the patent to be registered, he wanted to demonstrate the power of his invention in a local technology fair, and built a machine that could play Tic-Tac-Toe, which he named “Bertie the Brain”.

Comedian Danny Kay pleased after "beating" Bertie the Brain during the fair

Comedian Danny Kay pleased after “beating” Bertie the Brain during the fair

The machine, named “Bertie the Brain”, was a huge success at the fair. Kates made sure to adjust its level of difficulty to allow players to occasionally beat it, and visitors lined up to play. Nevertheless, at the end of the fair it was dismantled and forgotten. Unfortunately for Kates, the Additron took a very long time to go through patenting, and by the time it was approved technology has already moved on towards transistors.

minimaxThe algorithms pioneered and used in those early days were based on the Minimax method – constructing a tree of all possible moves by the player and its opponent, and evaluating the proximity to a winning position. In each move, the algorithm would assume best play with the computer playing the move with MAXimal value and the opponent playing its own maximum, which is the computer’s MINimal value, thus the algorithm could calculate into the future as much as time permitted.

With only 765 unique board positions in Tic-Tac-Toe, the game was small enough that all positions and moves could be calculated in advance, making Bertie unbeatable. AI researchers call this situation a “Solved” game. In fact, perfect game play will always end in a draw, and if you watched the movie “War Games” with Matthew Broderick, you’ll recall how this fact saved the world from nuclear annihilation…

advance to world-class wins

So if Tic-Tac-Toe is too simple, how about a more complex game, such as Checkers?

Checkers has, well, slightly more board positions; at 5 x 1020 board positions it was a much more challenging AI task. The most well-known checkers program, even if not the first, was the one written by Arthur Samuel at IBM. Samuel’s checkers was considered a real classic, and for several decades it was considered the best that can be achieved. It still used Minimax, but it expanded its repository of boards positions from actual games played, often against itself, thus becoming a true learning algorithm. However, it never got to a level of beating master human players.

checkers

In 1989, a group of researchers led by Jonathan Schaeffer from the University of Alberta set out to use advances in computing and break that glass ceiling, with a new program called Chinook. I had the privilege of attending a fascinating talk by Schaeffer in the Technion 10 years ago, and the blog post I wrote as a result summarizes the full story. That story has fascinating twists and touching human tributes in it, but it ends with machines being the clear winners, and AI researchers declaring the game of checkers as solved as well.

The obvious next challenge in our journey would be what’s considered the ultimate game of intelligence – Chess. Using the same board as Checkers, but more complex moves, chess has approximately 10120 board positions – that’s about the number of checkers positions, squared. A well-known chess-playing machine was “The Turk“, designed and constructed in Austria by Wolfgang von Kempelen, as early as 1770. The Turk wa a wonder at that time, beating experienced chess players and at one time, Napoleon Bonaparte. It was a hoax, of course, cleverly hiding a human sitting in it, but the huge interest it created was a symbol of the great intelligence attributed to playing the game.

kasparovHowever, the huge search space in which Minimax had to be applied for chess made early programs extremely weak against humans. Even with the introduction of minimax tree pruning methods such as Alpha-Beta pruning, it seemed like no algorithmic tuning would bring a breakthrough. As the decades passed, more powerful computers enabled faster computations and larger space to hold billions of possible board positions. This culminated in the famous duel in 1996 between IBM’s Deep Blue chess-playing computer, already capable of evaluating 200 million positions per second, and the world champion at the time, Gary Kasparov. Despite losing two games to Deep Blue, Kasparov won the tournament easily, 4-2. IBM went on to further improve Deep Blue, and invited Kasparov to a re-match in 1997. Kasparov won the first game easily, and was so self-confident as a result, that he lost the next game, a loss he blamed on cheating by IBM. The match ended 3.5-2.5 for Deep Blue, a sensational first win for a machine over a presiding world champion.

from brute force to TRUE learning

The shared practice that connected all the work we saw so far, from Bertie the Brain to Deep Blue, was to feed very large amount of knowledge to the software, so that it can out-do the human player by sheer computing power and board positions stored in its huge memory. This enabled algorithms such as Minimax process enormous numbers of positions, apply the human-defined heuristics to them, and find the winning moves.

Let’s recall the toddler from the start of our journey. Is this how humans learn? would we truly consider this artificial intelligence?

If we want to emulate true intelligence, what we’d really like to build are algorithms that learn by themselves. They will watch examples and learn from them; they will build their own heuristics; they will infer the domain knowledge rather than have it stuffed into them.

In 2014, a small London start-up named DeepMind Technologies, founded less than 3 years earlier, was acquired by Google for a staggering sum of $600M, before it released even one product to the market. In fact, reporters struggled to explain what DeepMind was doing at all.

deepmind-logoThe hints to what attracted Google to DeepMind lie in a paper its team published in December 2013. The paper, presented in NIPS 2013, was titled “Playing Atari with Deep Reinforcement Learning“. It was about playing games, but unlike ever before; this was about a generic system, learning to play games without being given any knowledge, nothing but a screen and the score-keeping part in it. You could equate it to a human who never played Pac-Man, taking the controls and just hitting them in all directions, watching the score, and gradually figuring out how to play it like a pro; and then, doing the same for many other games, all using the same method. Sounds human? this was the technology Google were after.

Watching DeepMind play Atari Breakout (see in this video) is like magic. The algorithm starts out moving at random, barely hitting the ball once every many misses. After an hour of training, it starts playing at an impressive pro level. Then it even learns the classic trick that any Breakout player eventually mastered – tunneling the ball to the top so that it hits bricks off with little effort. The beauty of it was that the exact same system mastered several other games with no custom optimizations – only the screen raw input, and an indication of where the score is; nothing else. This was no Minimax running, no feeding of grandmaster moves books, or human-crafted heuristic functions. It was generic deep-learning neural networks, using reinforcement learning that would look at a series of moves and their score outcome, and uncover the winning patterns all by itself. Pure magic.

AI Building games

For the last part of the talk, I deviated to a related, though different topic. For this part, I was walking through a wonderful series of blog posts I stumbled upon called “Machine Learning is Fun!”, Adam Geitgey walks through basic concepts in Machine Learning. In part 2, he describes how Recurrent Neural Networks can be trained to learn and generate patterns. The simplest example we all know and appreciate (or sometimes maybe not…) is the predictive typing feature of mobile keyboards, where the system attempts to predict what word we are trying to type – the great cause of so many texting gaffes.

Moving to more elaborate examples, Geitgey fed an RNN implementation with a Hemingway book (“The Sun Also Rises”), and trained it recurrently on the book’s text, then having it spit out texts of its own that would match the book. It starts out with incomprehensible strings of text, but gradually take the form of words and sentences, to the point that the sentences almost make sense, and carry Hemingway’s typical curt dialog style.

Geitgey then takes this system and applies it to no other than Super Mario Maker. This is a version of Super Mario that allows players to build levels of their own. He transforms game levels into text streams and feeds these into the learning system. Also here, at first the system spits out nonsense; it gradually learns the basic rules, and eventually generates actual playable levels. I’m no expert on Super Mario so I couldn’t tell, but I showed it to my son, and he said it’s a great level and that he would be happy to play it. That’s intelligent enough for me!

supermario

 

So Long, and Thanks for All the Links

 

Prismatic is shutting down its app.

I’ve been fascinated by algorithmic approaches to information overload for quite some time now. It seemed like one of those places where the Web changed everything, and now we need technology to kick in and make our lives so much easier.

Prismatic_logo,_June_2014Prismatic was one of the more promising attempts to that I’ve seen, and I’ve been a user ever since its launch back in 2012. Every time I opened it, it never failed to find me real gems, especially given the tiny setup it required when I first signed up. Prismatic included explicit feedback controls, but it seemed to excel in using my implicit feedback, which is not trivial at all for a mobile product.

flipboard-logo-iconFlipboard is likely the best alternative out there right now, and its excellent onboarding experience helped me get started quickly with a detailed list of topics to follow. With reasonable ad-powered revenue, which Prismatic seemed to shun for whatever reason, it is also less likely to shut down anytime soon. Prismatic still does a much better job than Flipboard in surfacing high-quality, long-tail, non-mainstream sources; let’s hope Flipboard continues improving to get there.

It seems, though, that news personalization is not such a strong selling point. Recently, Apple moved from a pure personalized play for its Apple News app to also add curated top stories, as its view counts disappointed publishers. In my own experience, even the supposed personalized feed was mostly made up of 3-4 mainstream sources anyway. Let’s hope that this is not where information overload is leading us back to. Democratizing news and getting a balanced and diverse range of opinions and sources is a huge social step forward, that the Web and Social Media have given us. Let’s not go backwards.

Marketing the Cloud

watsonIBM made some news a couple of days ago announcing consumers can now use Watson to find the season’s best gifts. A quick browse through the app, which is actually just a wrapper around a small dedicated website, shows nothing of the ordinary – Apple Watch, Televisions, Star Wars, Headphones, Legos… not much supercomputing needed. No wonder coverage turned sour after an initial hype, so what was IBM thinking?

tensorflowRewind the buzz machines one week back. Google stunned tech media, announcing it is open sourcing its core AI framework, TensorFlow. The splashes were high: “massive potential“, “Machine Learning breakthrough“, “game changer“… but after a few days, the critics were out, Quorans talking about the library’s slowness, and even Google-fanboy researchers wondering – what exactly is TensorFlow useful for?

Nevertheless, within 3 days, Microsoft quickly announced its own open source Machine Learning toolkit, DMTK. The Register was quick to mock the move, saying “Google released some of its code last week. Redmond’s (co-incidental?) response is pretty basic: there’s a framework, and two algorithms”…

So what is the nature of all these recent PR-like moves?

marketing-cloud

There is one high-profit business shared by all of these companies: Cloud Computing. Amazon leads the pack in revenue, and uses the cash flow from cloud business to offset losses on its aggressive ecommerce pricing, but also Microsoft and Google are assumed to come next with growing cloud business. Google even goes as far as predicting cloud revenue to surpass ads revenue in five years. It is the gold rush era for the industry.

But first, companies such as Microsoft, Google and IBM will need to convince corporates to hand them their business, rather than to Amazon. Hence they have to create as much “smart” buzz for themselves, so that executives in these organization, already fatigued by the big-data buzzwords, will say: “we must work with them! look, they know their way with all this machine-learning-big-data-artifical-intelligence stuff!!”

So the next time you hear some uber-smart announcement from one of these companies that feels like too much hot air, don’t look for too much strategy; instead, just look up to the cloud.

Thoughts on Plus – Revisited

plusTwo weeks ago, Google decided to decouple Google+ from the rest of the Google products, and to not require a G+ login when using those other products (e.g. YouTube), in effect starting to gradually relieve it from its misery. Mashable published excellent analysis on the entire history of the project, and of the hubris demonstrated by Vic Gundotra, the Google exec who led it.

Bradley Horowitz, who conceived Google+ along with Gundotra and is now the one to oversee the transition, laid out the official Google story in a G+ blog post. He talked of the double mission Google assigned to the project – become a unifying platform, as well as a product on its own. A heavy burden to carry, as in many cases these two missions will surely conflict each other and mess up the user experience, as they did. Horowitz also explains what G+ should have focused on, and now will: “…helping millions of users around the world connect around the interest they love…”

Well, unfortunately Horowitz seems to not be a regular reader of Alteregozi🙂 Had he read this post, exactly 4 years ago right here, perhaps G+ would have had more of a differentiation, and a chance.

Microsoft Israel ReCon 2015 (or: got to start blogging more often…)

Yes, two consecutive posts on the same annual event are not a good sign to my virtual activity level… point taken.

MSILSo 2 weeks ago, Microsoft Israel held its second ReCon conference on Recommendations and Personalization, turning its fine 2014 start into a tradition worth waiting for. This time it was more condensed than last year (good move!) and just as interesting. So here are three highlights I found worth reporting about:

Uri Barash of the hosting team gave the first keynote on Cortana integration in Windows 10, talking about the challenges and principles used. Microsoft places a high empasis on the user’s trust, hence Cortana does not use any interests that are not explicitly written in Cortana’s notebook, validated by the user. If indeed correct, that’s somewhat surprising, as it limits the recommendation quality and moreover – the discovery experience for the user, picking up potential interests from the user’s activity. I’d still presume that all these implicit interests are probably used behind the scenes, to optimize the content from explicit interests.

ibm_logoIBM Haifa Research Labs have been doing work for some years now on enterprise social networks, and mining connections and knowledge from such networks. In ReCon this year, Roy Levin presented a paper to be published in SIGIR’15, titled “Islands in the Stream: A Study of Item Recommendation within an Enterprise Social Stream“. In the paper, they discuss a feature for a personalized newsfeed included in IBM’s enterprise social network “IBM Connections”, and provide some background and the personalized ranking logic for the feed items.

They then move on to describe a survey they have made among users of the product, to analyze their opinions on specific items recommended for them in their newsfeed, similar to Facebook’s newsfeed surveys. Through these surveys, the IBM researchers attempted to identify correlations between various feed item factors, such as post and author popularity, post personalization score, how surprising an item may be to a user and how likely a user is to want such serevdipity, etc. The actual findings are in the paper, but what may actually be even more interesting is the deep dissection in the paper of the internal workings of the ranking model.

Outbrain-logoAnother interesting talk was by Roy Sasson, Chief Data Scientist at Outbrain. Roy delivered a fascinating talk about learning from lack of signals. He began with an outline of general measurement pitfalls, demonstrating them on Outbrain widgets when analyzing low numbers of of clicks on recommended items. Was the widget visible to the user? where was it positioned in the page (areas of blindness)? what items were next to the analyzed item? were they clicked? and so on.

Roy then proceeded to talk about what we may actually be able to learn from lack of sharing to social networks. We all know that content that gets shared a lot on social networks is considered viral, driving a lot of discussion and engagement. But what about content that gets practically no sharing at all? and more precisely, what kind of content gets a lot of views, but no sharing? Well, if you hadn’t guessed already, that will likely be content users are very interested to see, but would not admit to it, namely provocative and adult material. So in a way, leveraging this reverse correlation helped Outbrain automatically identify porn and other sensitive material. This was then not used to filter all of this content out – after all, users do want to view it… but it was used to make sure that the recommendation strip includes only 1-2 such items so they don’t take over the widget, making it seem like this is all Outbrain has to offer. Smart use of data indeed.

Microsoft Israel ReCon 2014

Microsoft Israel R&D Center held their first Recommendations Technology conference today, ReCon. With an interesting agenda and a location that’s just across the street from my office, I could not skip this one… here are some impressions from talks I found worth mentioning.

The first keynote speaker was Joseph Sirosh, who leads the Cloud Machine Learning team at Microsoft, recently joining from Amazon. Sirosh may have aimed low, not knowing what his audience will be like, but as a keynote this was quite a disappointing talk, full of simplistic statements and buzzwords. I guess he lost me when he stated quite decisively that the big difference about putting your service on the cloud is that it means it will get better the more people use it. Yeah.

Still, there were also some interesting observations he pointed out, worth mentioning:

  • If you’re running a personalization service, benchmarking against most popular items (i.e. Top sellers for commerce) is the best non-personalized option. Might sound trivial, but when coming from an 8-year Amazon VP, that’s a good validation
  • “You get what you measure”: what you choose to measure is what you’re optimizing, make sure it’s indeed your weakest links and the parts you want to improve
  • Improvement depends on being able to run a large number of experiments, especially when you’re in a good position already (the higher you are, the lower your gains, and the more experiments you’ll need to run to keep gaining)
  • When running these large numbers of experiments, good collaboration and knowledge sharing becomes critical, so different people don’t end up running the same experiments without knowing of each other’s past results

Elad Yom-Tov from Microsoft Research described work his team did on enhancing Collaborative Filtering using browse logs. They experimented with adding user browser logs (visited urls) and search queries to the CF matrix in various ways to help bootstrapping users with little data and to better identify short-term (recent) intent for these users.

An interesting observation they reached was that using the raw search queries as matrix columns worked better than trying to generalize or categorize them, although intuitively one would expect this would reduce the sparsity of such otherwise very long-tail attributes. It seems that the potential gain in reducing sparsity is offset by the loss of specificity and granularity of the original queries.

unique

Another related talk which outlined an interesting way to augment CF was by Haggai Roitman of IBM Research. Haggai suggested the feature of “user uniqueness” –  to what extent the user follows the crowd or deliberately looks for the esoteric choices, as a valuable signal in recommendations. This uniqueness would then determine whether to serve the user with results that are primarily popularity-based (e.g. CF) or personalized (e.g. content-based), or a mix of the two.

The second keynote was by Ronny Lempel of Yahoo! Labs in Haifa. Ronny talked about multi-user devices, in particular smart TVs, and how recommendations should take into account the user that is currently in front of the device (although this information is not readily available). The heuristic his team used was that the audience usually doesn’t change in consecutive programs watched, and so using the last program as context to recommending the next program will help model that unknown audience.

Their results indeed showed a significant improvement in recommendations effectiveness when using this context. Another interesting observation was that using a random item from the history, rather than the last one, actually made the recommendations perform worse than no context at all. That’s an interesting result, as it validates the assumption that approximating the right audience is valuable, and if you make recommendations to the parent watching in the evening based on the children’s watched programs in the afternoon, you are likely to make it worse than no such context at all.

Cortana

The final presentation was by Microsoft’s Hadas Bitran, who presented and demonstrated Windows Phone’s Cortana. Microsoft go out of their way to describe Cortana as friendly and non-creepy, and yet the introductory video from Microsoft Hadas presented somehow managed to include a scary robot (from Halo, I presume), dramatic music, and Cortana saying “Now learning about you”. Yep, not creepy at all.

Hadas did present Cortana’s context-keeping session, which looks pretty cool as questions she asked related to previous questions and answers, were followed through nicely by Cortana (all in a controlled demo, of course). Interestingly, this even seemed to work too well, as after getting Cortana’s list of suggested restaurants Hadas asked Cortana to schedule a spec review, and Cortana insisted again and again to book a table at the restaurant instead… nevertheless, I can say the demo actually made the option of buying a Windows Phone pass through my mind, so it does do the job.

All in all, it was an interesting and well-organized conference, with a good mix of academia and industry, a good match to IBM’s workshops. Let’s have many more of these!

The Great Managers Balancing Act

With so many approaches to management – and of software development in particular – there are plenty of authors who write about it. I don’t intend to join that fray. Personally, I enjoy the “What” much more than the “How”, but recently this piece of insight dawned on me.

To be helpful, a good middle manager does one of two things:

  1. Up: Make decisions and be held accountable for their outcome.
  2. Down: Remove obstacles from his team’s path.

Where it gets interesting is where #1 and #2 collide, and how this manager deals with it. Great managers find the right balance. Mediocre managers can only handle this by screwing one at the expense of the other.

For example, a certain middle manager gets some directive handed down from above, while the team is already at full capacity. Rather than trading off another highly prioritized task and facing a tough time with higher management, he prefers to push the requirement down to his team, to try and “make an extra effort.” He even considers it his decision, so he feels that he lives up to #1. But sadly for his team, not only did he not remove obstacles, he also just added more.

Alternatively, such managers try to execute #2 and help their team by making the tough decisions that remove an obstacle. But because they do not realize they’re the ones held accountable on these decisions, they prefer to not communicate them upward to keep their political standing, thus violating #1. This eventually results in the team losing credibility and being considered lower-execution, despite all their hard work.

Of course, how to successfully balance #1 and #2 and still keep your job and sanity as a manager is a separate topic, one I’ll leave to the management experts to discuss…

"life is a great balancing act..."