07【原声】Chapter 2 | Understanding AI / AI and the Art of Playing Go

00:00

14:29

Part 1: No data, no problem

0:00/14:29

Hello listeners of Himalaya, and welcome to this episode in our series “AI and Us”. In the previous episode I explained how AI really works. I introduced you to supervised learning, when the machine uses carefully labelled training data to improve through trial and error. And I explained that if one only has unlabelled data, unsupervised learning can still offer insights. But what is the right strategy when there is no data at all? This is the topic of this episode. And later, we take a closer look at the ancient game of Go.

But let’s start with learning from data. It might sound strange at first to say that a machine can learn even if there is no data at all, because learning is always based on experience, which in turn is contained in data. But there is a strategy around it. It is ingenious. It can work amazingly well. And it is currently all the buzz in AI. It is called reinforced learning.

The basic idea is very simple and straight-forward. If you have no data to learn from, then make your own data to learn from.

Now let me explain how this works in practice. Let’s suppose a machine should learn how to play the simple game of tic-tac-toe, in which two players play each other placing marks on a 3x3 field. The winner is who first achieved three marks in a horizontal, vertical, or diagonal row. It’s a popular game among young kids in the West. If a machine had to learn how to play the game well without training data, it could simply generate the training data by playing against itself.

Initially, it will not know which individual move is good or bad, it only knows the eventual outcome of the game. But there is a sufficiently clear link between each move and the eventual outcome that the machine, based on training data it generated, will be able to improve its game over time. After sufficient training it will play masterful and no longer lose a single game. It has learned to play tic-tac-toe without direct feedback for each move from somebody outside. It learned the best way to play through trial-and-error.

Now, you may think that tic-tac-toe is a simple and silly game that even young kids after some hours of playing learn how to play so well that they never lose. A machine learning to play this game isn’t very impressive. And you are right. But the principle can be expanded.

Part 2: DeepMind’s Deep Mind

02:59/14:29

A British start-up company called DeepMind is considered to be among the very best in the world in reinforced machine learning. DeepMind’s reinforced learning approach was quite unique about a decade ago. But DeepMind successfully applied it to a number of challenging domains. It impressed many AI experts, including those at Google, which eventually bought DeepMind. Today, DeepMind’s methods are widespread in AI and popular. And have shown their inherent power.

But let’s first look at DeepMind’s method again through another example. You may have heard or read about the first video game ever available commercially about fifty years ago. It was called Pong, and it had two vertical lines, one on each side of the screen that could be moved up and down by the players.

Between these lines a dot would move, a bit like a table tennis ball, and would bounce back if it hit a vertical line. The goal was to keep the ball bouncing back to your opponent’s side, much like a table tennis player would try to return the ball to the other side. It was a simple game. To play it well, human players needed good hand-eye-coordination, and a bit of a sense of the best position for the dot to hit and be reflected.

DeepMind took this ancient video game. But instead of human players, it had its reinforced learning machine to try. Importantly, the machine had not been told the rules. It only knew when the game was over (when the dot did not hit a vertical line to be returned back). It then began to play against itself.

Initially it had no idea how it could stop the dot with the vertical line. The machine has no idea of table tennis, no clue about physics and the laws of motion. All the analogies from reality that make it easy and almost intuitive for humans to play Pong meant nothing to the machine. For the machine Pong initially was a black box, a game with secret rules that made no sense. That makes it very challenging for the machine to play Pong well. It has no clear strategy how to improve its play.

Of course, the machine could simply learn through trial and error. Eventually, after having tried all positions and all variations, it may have calculated the optimal solution for each move in each circumstance. But even for a simple game of Pong that would take a lot of calculating and require a lot of effort. The secret of DeepMind’s reinforced learning is that it mixes trial-and-error and some more deliberate mechanisms to come up with new moves to try out and evaluate.

That way, DeepMind’s reinforced learning machine was able to learn how to play Pong really well without any initial understanding of the game and without any initial training data. It learned entirely on its own by playing against itself. In the end, it was so good it won every game against humans, even though humans know the game’s rules, have useful analogies to apply, and lots of practice.

Now, Pong isn’t a particularly difficult game to play – at least for us humans. The bigger question is whether DeepMind’s reinforced learning approach can also be successful in far more challenging contexts. Which brings us to the game of Go.

Part 3: Go!

07:00/14:29

I am sure you all are familiar with Go, the complex and beautiful board game that originated in China but is now played around the world. It has relatively simple rules, but the possible moves are astronomical. For a typical game, there are 10^360 possible moves, many more moves than there are atoms in the universe!

The game is far more difficult to play well than chess. It is thought to require from its human players strategic thinking, intuition, foresight, and lots of practice. Because of the combination of simple rules and high complexity, because of the non-obvious link between a single move and the eventual winner (or loser), no computer, no machine had made significant progress in playing Go well.

This all changed in October 2017, when DeepMind published an article in the famed scientific magazine Nature. In it they described AlphaGo Zero, an AI system based on reinforced machine learning, that played Go better than humans. In fact, after only 30 days of learning , AlphaGo Zero exceeded 5,000 Elo points.

Most importantly, AlphaGo Zero learned to play Go and excel in it without any human training data. It purely learned by playing against itself. This is amazing. It shows how in situations in which no data whatsoever is initially available, a machine can learn over time to navigate even a very complex situation and to make decisions that lead to an eventual success.

But there is more. After watching AlphaGo Zero, professional Go players were astonished by the moves the machine made. They were unexpected and did not reflect the most popular human strategies to win in Go. The machine that had learned from itself played Go differently, in a less predictable and perhaps even a amore creative way. It puzzled and fascinated the Go community. As a result, many top Go players began to rethink some of their strategies and approaches to the game, essentially learning from the machine how to better play the game they seemed to be so familiar with.

After AlphaGo Zero’s success, some top international players even suggested that at the very top, the days of humans playing against machines and standing a chance to win were over. The future, they said, was now for machines playing Go against other machines. No more humans needed, except to watch and applaud the eventual winner, and to marvel its abilities. I am not sure they are right, but AlphaGo Zero certainly demonstrates the enormous potential that the strategy of reinforced machine learning possesses.

Reinforced machine learning is not just great in cases with no initial training data. It also demonstrates that in such cases the machine can discover insights and develop approaches and strategies that are quite different from our human approaches. This has the potential to provide us with novel and powerful insights that are truly valuable.

Part 4: Go Beyond!

10:54/14:29

Going beyond the game of Go, reinforced machine learning has now been successfully employed in many different domains and circumstances. Here are a few interesting examples. Alibaba, for instance, uses it to improve auctions for advertising displays.

Reinforced machine learning has been used to optimize the duration of red traffic lights on intersections – which is both really helpful for traffic management and really impressive, because the machine reached its conclusion without feedback data. And DeepMind used it to analyse the energy efficiency of Google’s data centres and provide ways to lower energy consumption. By up to how much do you think? Leave your guess in the comments section. Here’s a hint: it’s impressive!

This all shows that reinforced learning can help in a wide variety of circumstances, not just to improve efficiency and enhance profits, but to help us make better decisions in society and for the environment. It helps not just a company’s bottom line, but it can be useful for humanity as a whole by offering us a new avenue to improve human decision making – which as you may recall is what counts!

But as I conclude this episode on reinforced machine learning, let us hold our pace for a moment and ponder. In the last episode, we looked at supervised and unsupervised learning, as two ways to learn from data. The first, supervised learning, requires carefully labelled training data. It is essentially learning from human-generated data. The second, unsupervised learning can use training data that is not meticulously labelled. It points humans in the right direction; it offers us clues, but not understanding. In this episode, I explained reinforced machine learning, which initially does not need any training data at all, labelled or unlabelled, because it produces its own.

These methods, and many more derived from them, can be cleverly combined and mixed together. But they all share one important element that we must not forget. At the end of the day, all machine learning is driven by data. The data may be labelled or unlabelled, it may be human-based or generated by machines, but at the end, all such machine learning depends on data. Without data, there can be no such AI.

The power of AI is helping us understand the world. Until now, we have done that by asking questions. Now with AI, we are getting even better at asking the right questions. How I will explain in the next episode. See you then!

以上内容来自专辑

主播信息

牛津教授舍恩伯格

720

加关注

还没有评论，快来发表第一个评论！

07【原声】Chapter 2 | Understanding AI / AI and the Art of Playing Go

舍恩伯格：AI与我们 | 人工智能进阶通识课

牛津教授舍恩伯格

【讲解版本】Clarification and Understanding解释与理解

【讲解版本】Clarification and Understanding解释与理解

【讲解版本】Clarification and Understanding解释与理解

【讲解版本】Clarification and Understanding解释与理解

【讲解版本】Clarification and Understanding解释与理解