Learning to Play

Ever since I took my first course in Artificial Intelligence, I have been fascinated by the idea of AI in its classical meaning – teaching machines to perform tasks deemed by us humans as requiring intelligence.

Recently, I gave a talk at my company on some of the intriguing instances of one of these tasks – learning to play (and win!) games. I often found the human stories behind the scenes even more fascinating than the algorithms themselves, and that was my focus in this talk. It was really fun both to assemble as well as deliver, so I wanted to capture these stories in this blog post, to accompany the embedded slides below.


So let’s get started!

a humble start

Game playing is a fantastic AI task, one that researchers were always excited about. Just like a toddler being taught to swing a baseball bat by an excited parent, the algorithm gets clear rules, a measurable goal and training input. But above all, testing the result involves the fun act of playing against the opponent you yourself have created, just like a proud parent. What a great way to do AI research!

As we go way back in the AI time machine, the first known implementation of an AI game was in 1950. Josef Kates was a young Jewish Austrian engineer, whose family fled the Nazis’ rise to power and ended up in Canada. Kates worked on radar and vacuum tubes design at a company named Rogers Majestic, and later developed his own patented tube, which he called the Additron. While waiting for the patent to be registered, he wanted to demonstrate the power of his invention in a local technology fair, so he built a machine that could play Tic-Tac-Toe, calling it “Bertie the Brain”.

Comedian Danny Kay pleased after "beating" Bertie the Brain during the fair

Comedian Danny Kaye pleased after “beating” Bertie the Brain during the fair

“Bertie the Brain” was a huge success at the fair. Kates made sure to adjust its level of difficulty to allow players to occasionally beat it, and visitors lined up to play. Nevertheless, at the end of the fair it was dismantled and forgotten. Unfortunately for Kates, the Additron took a very long time to go through patenting, and by the time it was approved technology had already moved on toward transistors.

minimaxThe algorithms pioneered and used in those early days were based on the Minimax method – constructing a tree of all possible moves by the player and opponent, and evaluating the proximity to a winning position. In each move, the algorithm would assume best play with the computer playing the move with MAXimal value and the opponent playing its own maximum, which is the computer’s MINimal value. Thus, the algorithm could calculate into the future as much as time allowed.

With only 765 unique board positions in Tic-Tac-Toe, the game was small enough that all positions and moves could be calculated in advance, making Bertie unbeatable. AI researchers call this situation a “Solved” game. In fact, perfect game play will always end in a draw, and if you watched the 1983 movie “War-Games” with Matthew Broderick, you’ll recall how this fact saved the world from nuclear annihilation…

advance to world-class wins

So if Tic-Tac-Toe is too simple, how about a more complex game such as checkers?

Checkers has, well, slightly more board positions: at 5 x 1020 board positions, it was a much more challenging AI task. The best-known checkers program, even if not the first, was the one written by Arthur Samuel at IBM. Samuel’s checkers was considered a real classic, and for several decades it was considered the best that can be achieved. It still used Minimax, but expanded its repository of board positions from actual games played, often against itself, thus becoming a true learning algorithm. However, it never got to the level of beating master human players.


In 1989, a group of researchers – led by Jonathan Schaeffer from the University of Alberta – set out to use advances in computing and break that glass ceiling with a new program called Chinook. I had the privilege of attending a fascinating talk by Schaeffer at the Technion 10 years ago, and the blog post I wrote subsequently summarizes the full story. That story has fascinating twists and touching human tributes in it, but it ends with machines being the clear winners – and with AI researchers declaring the game of checkers as solved as well.

The obvious next challenge in our journey would be what’s considered the ultimate game of intelligence – chess. Using the same board as checkers, but with more complex moves, chess has approximately 10120 board positions – that’s about the number of checkers positions, squared. A famous chess-playing machine was The Turk, designed and constructed in Austria by Wolfgang von Kempelen as early as 1770. The Turk was a wonder of its age, beating experienced chess players and even Napoleon Bonaparte. It was a hoax, of course, cleverly hiding a human sitting inside it, but the huge interest it created was a symbol of the great intelligence attributed to playing the game.

kasparovThe huge search space in which Minimax had to be applied for chess made early programs extremely weak against humans. Even with the introduction of minimax tree-pruning methods such as Alpha-Beta pruning, it seemed like no algorithmic tuning would produce a breakthrough. As the decades passed, though, more powerful computers enabled faster computations and larger space to hold billions of possible board positions. This culminated in the famous 1996 duel between IBM’s Deep Blue chess-playing computer – already capable of evaluating 200 million positions per second – and the world champion at the time, Garry Kasparov. Despite losing two games to the supercomputer, Kasparov won the tournament easily, 4-2. IBM went on to further improve Deep Blue and invited Kasparov to a re-match the following year. Kasparov won the first game easily, and was so confident as a result that he lost the next game, a loss he blamed on cheating by IBM. The match ended 3.5-2.5 to Deep Blue, a sensational first win for a machine over a presiding world champion.

from brute force to TRUE learning

The shared practice that connected all the work we saw so far – from Bertie the Brain to Deep Blue – was to feed huge amounts of knowledge to the software, so that it could out-do the human player by sheer computing power and board positions stored in its vast memory. This enabled algorithms such as Minimax to process enormous numbers of positions, apply the human-defined heuristics to them and find the winning moves.

Let’s recall the toddler from the start of our journey. Is this how humans learn? Would we truly consider this artificial intelligence?

If we want to emulate true intelligence, what we’d really like to build are algorithms that learn by themselves. They will watch examples and learn from them; they will build their own heuristics; they will infer the domain knowledge rather than have it fed into them.

In 2014, a small London start up named DeepMind Technologies, founded less than three years earlier, was acquired by Google for the staggering sum of $600 million before it had released even one product to the market. In fact, reporters struggled to explain what DeepMind was doing at all.

deepmind-logoThe hints at what attracted Google to DeepMind lie in a paper its team published in December 2013. The paper, presented in NIPS 2013, was titled “Playing Atari with Deep Reinforcement Learning“. It was about playing games, but unlike ever before. This was about a generic system, learning to play games without being given any knowledge, nothing but a screen and the score-keeping part in it. You could equate it to a human who had never played Pac-Man, taking the controls and just hitting them in all directions, watching the score and gradually figuring out how to play it like a pro and then doing the same for many other games, all using the same method. Sounds human? This was the technology Google was after.

Watching DeepMind play Atari Breakout (seen in this video) is like magic. The algorithm starts out moving randomly, barely hitting the ball once every many misses. After an hour of training, it starts playing at an impressive pro level. Then it even learns the classic trick that any Breakout player eventually masters – tunneling the ball to the top so that it hits bricks off with little effort. The beauty of it all was that the exact same system mastered several other games with no custom optimizations – only the screen raw input and an indication of where the score is, nothing else. This was no Minimax running, no feeding of grandmaster moves books or human-crafted heuristic functions. It was generic deep-learning neural networks, using reinforcement learning that would look at a series of moves and their score outcome, and uncover the winning patterns all by itself. Pure magic.

AI Building games

For the last part of the talk, I deviated to a related topic. For this part, I was walking through a wonderful series of blog posts I stumbled upon called “Machine Learning is Fun!”, where the author, Adam Geitgey, walks through basic concepts in Machine Learning. In part two, he describes how Recurrent Neural Networks can be trained to learn and generate patterns. The simplest example we all know and appreciate (or sometimes not…) is the predictive text feature of mobile keyboards, where the system attempts to predict what word we are trying to type – the cause of so many great texting gaffes.

Moving to more elaborate examples, Geitgey fed an RNN implementation with a Hemingway book (“The Sun Also Rises”), and trained it recurrently on the book’s text, then having it spit out texts of its own that would match the book. It starts out with incomprehensible strings of text, but gradually takes the form of words and sentences, to the point that the sentences almost make sense and retain Hemingway’s typically curt dialogue style.

Geitgey then takes this system and applies it to none other than the Super Mario Maker. This is a version of Super Mario that allows players to build levels of their own. He transforms game levels into text streams and feeds these into the learning system. Again here, at first the system spits out nonsense. But then it gradually learns the basic rules and eventually generates actual playable levels. I’m no expert on Super Mario so I couldn’t tell, but I showed it to my son and he said it’s a great level that he would be happy to play. That’s intelligent enough for me!



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s