Part 1: Learning and Knowing are Two Different Things
00:00/16:37
Hello listeners of Himalaya, and welcome to this episode of “AI and Us”. I have already explained that AI learns from data. This time, we take a close look how. So, please come along.
When you hear the word “learning”, what come to your mind? One possibility is school or apprenticeship. These are situations, in which you acquire knowledge and expertise because your teacher or boss tells you about it. The knowledge is provided in ready-made pieces. It still requires effort to remember what you have been told or shown, but exactly what knowledge you are acquiring is clear – and right in front of you, so to speak.
A different kind of learning is when you have to solve a riddle and are not given the answer. Think of a puzzle with a thousand pieces and you do not know how the fit together; or the famous Rubick’s cube, the colourful cube that you can twist and turn to move smaller cubes around; or think about a crossword puzzle or Sudoku.
In each of these cases, you may know the basic idea behind the puzzle, but to solve it, you will have to try out moves, and see whether this gets you closer to the solution. It is a time-consuming process of experimentation, of trial and error, but when you work long and hard enough for it, eventually you stand a good chance of solving it. Children do this kind of learning all the time. Just think of how they learn to ride the bicycle. Their parents can tell them to hold steady and pedal, but successfully keeping the balance takes lots of attempts; it takes practice.
In a way, it is the same with AI. In the old days, as I mentioned in an earlier episode, AI meant telling the computer what to do - feeding ready-made knowledge into the machine. It’s like parents telling their children exactly what to do. That approach failed, because life is too complex and multi-facetted to capture in unambiguous and clear rules.
The modern AI is different. It is not told the solution, it acquires the solution through “learning” – that is through experimentation, through trial and error. In one popular version of machine learning, the system starts with no concrete understanding. It has a kind of random, empty model of the world. It is then fed data, which produces results. The results are then compared to actual and correct results. When the machine’s results are wrong, the machine adjusts the model and tries again.
If the change did not improve results, it will readjust and try again. If that change improves results, the machine will take more data, calculate and compare again against correct results. It really is a process of trial and error. Eventually the model is no longer empty and random but represents the data that has been fed into the machine. Eventually, the machine will improve. In general, this takes time, and lots and lots and lots of data. And the results improve, as more training data is used. But the improvements aren’t linear. As the machine gets better and better, it requires more and more data to improve even further.
Take the case of FaceNet, a system that can identify human faces in digital images. It learns through training data. After working with 2.6 million training images, the machine correctly identified human faces in more than three out of four images. When fed a hundred times as much training data – that is 260 million training images – the system had improved from 76 percent to 86 percent. It is an impressive improvement, but it took also an enormous amount of additional training data!
Experts around the world have been working really, really hard to reduce the amount of training data necessary to produce good results. In 2016, researchers at IBM offered an encouraging case in point. They trained a machine learning system to identify skin cancer in high-resolution images of human skin. Each skin image contained lots of data values – think of it like a high-resolution digital photo. The researchers used 900 images as training data, then compared their machine to human dermatologists with more than a decade of experience.
The machine was better than the average dermatologist in correctly spotting skin cancer. But the 900 images contained lots of data, and 900 images is far more than an average dermatologist can actively remember having seen. So, on the one hand, it’s a very significant achievement. The system got very good with relatively few images. On the other hand, the training data still far exceeds what humans need to be quite good at identifying skin cancer. In short: this kind of machine learning offers great opportunities when there is sufficient training data available.
This is a very important point! So, let me highlight this. The advantage of machine learning isn’t that the machine learns quickly from very few data points, but that learns continuously and continues to improve, albeit slowly, and is very dependable in its accuracy because it does not forget or get distracted.
Part 2: No Data, no Insight?
07:10/16:37
Machine learning works very well, when there is enough data available to train the system. That is far easier in the age of Big Data than before. In many areas, we are now collecting enormous amounts of data. Every three years or so, the number of medical and life science research papers doubles. So when a doctor graduated at the age of 25, by the age she is 35 more than three quarters of research was added after she completed her studies.
Just let me contrast this with my own past. When I grew up in beautiful but rural Austria, there was no Internet, only two television programs, and if I wanted to look something up in the encyclopaedia, I had to go to the library. Our family doctor had been treating patients for decades with the same medication.
And what is true for the access to knowledge and information, is also true more generally for data. Across the globe, the amount of digital data we collect doubles about every two year. It’s a staggering development. And it is the ideal basis for data-driven machine learning.
But there are still areas where we do not have enough training data. Where collecting enough data is too costly, too cumbersome, or too impractical. Are these AI’s blind spots? Are these the areas we cannot hope for artificial intelligence to help us?
Not necessarily. There are two innovative strategies that enable data-driven machine learning even when we initially do not have sufficient training data at our disposal. Let me explain each of the two strategies in turn.
The first strategy can be used when there is enough data available, but the data is not labelled or annotated so that the machine knows whether it is right or not. Remember the example about skin cancer? Suppose we had lots of skin images and some of the images would depict skin cancer, but we do not know which ones. Or take the example of the machine recognizing faces in images: suppose we have millions of images, but the machine does not know which image contains a face and which does not. For such cases this second strategy is ideal. Experts often call it “unsupervised” learning, contrasting it with supervised learning where the machine is told for training data what is correct and what is not.
There are many situations for such unsupervised learning. Just think about it: we do collect a lot of data, but it is so much that we frequently don’t spend the time and effort to label the data correctly. Every day many hundreds of millions of photos are shared online, but few of these photos are labelled to say what is in the image, for example people or cats or beaches or hills. Such unlabelled images are pretty useless data for supervised machine learning and can’t be used.
Unsupervised machine learning can tap into this vast treasure trove of unlabelled data. So with unsupervised learning many more piles of data that we could not use before suddenly become potentially valuable and useful. It’s quite exciting.
But, I am sure you’ll ask, what can a machine learn by itself from such data, if it does not have access to or receive any feedback; if it does not know whether its learning is right or not? Trial and error works, because we realize our errors and can correct them and try again. But if the machine does not know when it makes a mistake how can it learn?
You are right. With unsupervised learning the machine cannot learn in the usual way of being told what is right and what is wrong. Instead, unsupervised learning looks for unusual patterns in the data. Much of reality is pretty predictable. And this predictability shows up in the data; it contains similar patterns. When the data is suddenly very different, something unexpected is happening. These are the patterns unsupervised learning can reveal; the strategy simply builds on the infrequency of surprises.
Unsupervised machine learning is quite powerful. It almost literally sees things in the data that we humans would routinely miss. Unsupervised machine learning is very well suited for complex patterns that are out of the ordinary, but not immediately visible. In this very important sense, unsupervised machine learning can expand and extend human understanding. It can point us in directions we did not know about, point at possible causal linkages we have yet to explore.
But unsupervised machine learning also has a significant weakness. It can identify unexpected patterns in the data, and do so quite well, but it does not know what to do with them and how to interpret them. It’s “seeing” them, without “knowing” what they mean. As it sees the patterns, it can even become quite good at predicting their frequency and the likelihood they happen again and perhaps even when and where. That itself is very useful. But it is often just the first step, followed by some very human intervention.
Just take, as an example, sensor data from a jet engine. Suppose unsupervised machine learning would discover a certain very unusual pattern emerging in the data, even though the jet engine seems to run fine. This surely could be a very useful early sign of a problem. But to confirm it actually is a problem, and to find out about the problem, additional examinations are needed. A pure analysis of the data by the machine doing the unsupervised learning will not suffice. As I said, it sees, but does not understand.
Unsupervised learning works very well in alerting humans to problems. It helps foresee problems in a jet engine and other machinery, and thereby focus human labour on the most likely problem spots. Experts call this predictive maintenance, because ideally repairs are done before faults happen.
Unsupervised learning also is useful in fraud protection, for instance with forged bills, documents, and credit card transactions – because the machine can discover unexpected patterns early on. The problem with fraud is that criminals are always devising new ways to commit their crimes. Unsupervised learning discovers novel and unexpected patterns swiftly and alert human investigators to it.
But unsupervised learning still requires a lot of data to do its analysis and learn from, even if that training does not need to be labelled correctly. That limits its application. Unsupervised learning won’t work if there simply is a lack of data.
For such cases, there is a different machine learning strategy. It is called reinforced learning, and I will tell you all about it in the next episode.
还没有评论,快来发表第一个评论!