Yesterday I watched a Go game (for more info click here1) between one of the best human players, Lee Sedol2, and the strongest computer program, AlphaGo3, which has been developed by Google DeepMind. It was a thrilling fight broadcasted on YouTube4 from Seoul, and was a very close game initially.
The American commentator Michael Redmond, who is the only Western professional player to reach 9 dan, thought that AlphaGo was somewhat in the lead in the early stages. This changed when Lee Sedol played a brilliant move that caught AlphaGo by surprise (it only realized its mistake later on). The only problem was that Lee Sedol was also very low on time and it almost expired on several occasions, but every time he played the move just one or two seconds before his clock ran out (there is a one minute increment after each move). AlphaGo still had plenty of time left so the pressure was immense on the human player but he kept his cool. AlphaGo was struggling and even played some really strange and bad moves. At the later stages of the game (it started at 11pm and ended at 4:45am, though we also switched to daylight saving time during the game) it became clear that Lee Sedol was going to win. And eventually AlphaGo resigned: a victory for human kind!
This may be a good time to mention that Lee Sedol lost the previous three games against an essentially flawless AlphaGo and, in fact, the South Korean player had already lost the series before yesterday's win (it is best of five). AlphaGo's victory is considered to be a significant milestone in machine learning and AI. Computers have been superior in chess for almost 20 years: Garry Kasparov lost against IBM's supercomputer Deep Blue in 1997, and nowadays chess programs running on a phone can beat the best of us (including Magnus Carlsen). But the complicated board game Go is a different story, and it has been considered the holy grail of AI. Before AlphaGo there was a one million dollar prize available for beating a professional player with a computer program. A big difference between chess and Go is the so called branching factor: the number of options available for each move. In chess this number is about 35 on average, whereas it is 250 on average for Go (to put this in perspective: the number of possible Go games far exceeds the number of atoms in the observable universe). This poses a huge problem for computers because it takes a lot of time to explore possible moves for any reasonable depth.
So how did the London startup DeepMind (which was acquired by Google in 2014) achieve this spectacular feat? The main program (for the paper, click here4), which runs on a distributed network of computers (there is also a single computer version which is actually not that much weaker), uses a policy network, a value network, and a Monte Carlo (randomized) tree search component. The policy network can be looked at as a heat map of likely moves, i.e., the hotter the area the more time should be spend on exploring that move. The value network assesses a board position and estimates the probability of winning. The tree search plays through the moves using the other two components as drivers. AlphaGo was trained on Go games from human players but also played many games against itself to improve. Finally, Go is a reinforcement learning problem, meaning that it is not immediately clear whether a move is good or bad as this only becomes clear at the end of the game. This makes updating the parameters for the policy and value network, which are deep learning neural networks, very challenging. Deep learning neural networks, by the way, are essentially neural networks (popular in the eighties and nineties) that use a lot more layers. Each layer is a higher level of abstraction, e.g., a deep learning neural network trained on images would consider edges and lines low level features and circles, squares, or even human faces much higher level features that are present in higher layers. (We can relate to that, because a big part of our job at Othot is to find more predictive features.)
These are exciting times we live in. Of course, there are areas where there is still a lot of progress to make. For example, it may take some more time before we see a Sidney Crosby robot on skates, though Boston Dynamics has shown some very interesting videos.6 The bottom line is that machines and software can help us with more and more sophisticated tasks. While at Othot we are not working on Go playing software, we do use state of the art machine learning algorithms to solve problems for our customers!
By Mark Voortman | March 15, 2016