01【原声】疫情特别节目 | Seeing Hidden Patterns

01【原声】疫情特别节目 | Seeing Hidden Patterns

00:00
19:33

Part 1: Introduction

Brief: Introduce the importance of AI by talking about the severity and difficulty of SARS-CoV-2 

00:20/19:33 

The SARS-CoV-2 virus caught the world by surprise. Within months millions were infected. Even with the best medical care, ten to a hundred times more of the infected died compared to the seasonal flue (which already accounts for many tens of thousands of deaths every year). 


Public health officials and experts called it the biggest health crisis the world has seen since the severe Influenza pandemic in the early 20th century. As the virus spread, doctors and nurses had little help to offer. There was no treatment, and no vaccine. Worse, even the world’s best experts knew precariously little about the virus and the illness that it causes.


That’s a terrible situation. From doctors to government officials to the average person on the street, they all have been facing important decisions: how to care for patients (and how to protect themselves from getting infected), how to contain the virus, and how to behave in their daily lives. 


When we humans make decisions, the best basis for decision making is to have facts that can inform our decisions. But in this case, with the outbreak of a new virus, we did not have facts; we lacked the appropriate data. And that meant that our decisions lacked precision.


Part 2: Combine Theory with SARS-CoV-2 virus

Brief: How AI works specifically to combat virus

02:03/19:33 

Fortunately, we had data-driven AI: when the machine learns from massive amounts of training data. Such learning can discover and build upon subtle patterns in the data that hardly meet the eye even of a very discriminating observer. The machine may take a lot of data to learn, but once it has done so, it’s amazingly good at it.


So let’s see how these technologies have aided us in understanding the SARS-CoV-2 virus. It starts with the virus itself. Viruses contain genetic information that they insert into a living cell. The virus information then directs the cell to produce new viruses; in other words, the virus highjacks the cell to produce copies of the virus. That’s how the virus proliferates. So the functioning of the virus is contained in the genetic information inside the virus, called RNA. These are tiny molecules. 


Their meaning has only been discovered less than a hundred years ago. Transcribing the information in RNA only began in the 1970s, and only really took off in the late 1990s thanks in significant part to breakthroughs in digital technology and big data. Researchers are only able to capture small sections of this RNA at a time, but thanks to fast and powerful computers doing sophisticated pattern analysis, these small sections of information can be assembled and stitched together, not by hand, but by machines.


By mid-January 2020, only a few weeks after the new SARS-CoV-2 virus had been identified, researchers in Shanghai and Beijing together with colleagues in Australia had already completely sequenced the virus’ genetic information. It was an amazing accomplishment. And it laid the foundation for all of our subsequent understanding of the new virus. Their achievement was possible because they could capture the data swiftly and then stitch it together with sophisticated pattern matching software.


Importantly though, the researchers did not keep the information to themselves, but shared it online with the scientific community. Everyone could download the genetic information of the virus and begin to understand it, again using every analytics and AI tool available. Suddenly a global army of researchers were looking at the virus.


Since then, the virus has been sequenced many times, identifying tiny mutations that act as distinct markers, and help public help agencies around the world to understand how the virus has spread from one country to the next. In the early weeks of the pandemic, some pundits suggested that the virus was human-made, perhaps even on purpose; some shocking conspiracy theories still suggest that is the case.


But that is patently wrong. Already in mid-March 2020, a team of genetic researchers in the US, the UK and Australia published their findings, based on data-driven pattern analysis of the virus gene to conclude that the virus is natural, likely having made its way from animals to humans.


Two data-derived reasons in particular are put forward as evidence. The first is that all genetic engineering in labs leave specific patterns in the genetic information of a virus, like a fingerprint. But in the SARS-CoV-2 virus no such pattern can be found. Additionally, a closely related virus affecting pangolins has a genetic pattern that can easily mutate to transform into the SARS-CoV-2 virus through natural selection – a data pattern or fingerprint, if you want, that points to natural causes.


Within little more than a month, researchers around the world had identified the most promising ways to create an effective vaccine, and developed prototypes for testing. Normally it takes years to achieve such a milestone, but humanity by coming together achieved this first success in a small fraction of the time. In significant part, this is the success of data-driven AI used in bio-engineering – a topic we will look into more closely in the third chapter of this course.


And within weeks, other teams around the world, including at the WHO, again building on the AI-based data analysis created first test kits for infections. These tests can identify the virus’ genetic information in a human. Later versions greatly increased the number of tests that could be done per day – and this, too, was achieved by using data analysis and machine-aided pattern matching. The next step were antibody tests, that detect whether an individual had been infected, even though she is no longer sick.


And all around the world online research platforms sprung up that offered not only huge amounts of research data on corona viruses in general and the SARS-CoV-2 in particular. But some of these research platforms, like Research Gate, use sophisticated AI software to make the huge amount of existing research on corona viruses more easily accessible to researchers. Their goal is to provide the tools for experts to find the proverbial needle in the haystack of research publications.


While all of this work on the virus itself was going on, the virus spread; and tens of thousands of people fell sick. Without a vaccine, that required quick decisions on how to reduce the spreading. That’s when doctors and statisticians in China, and later elsewhere around the world, began to collect epidemiological data: how many people get infected and where, how many fall sick, how many die?


When you do not know who exactly is infected and who isn’t – because you have no reliable test kit yet, you need to work with proxies: with data about similar dynamics and behaviors. And that is precisely what happened: researchers and policy makers used data about general mobility and human interaction in certain infection hotspots to understand how easily the virus could spread from one person to the next. 


Travel and commuter data were used, and so were transaction and location data. In these early phases of the pandemic, using proxies and getting a first, albeit crude sense of the spread was essential.


In Europe, too, mobile phone companies provided public health officials and governments with anonymized data of mobile users logging in and out of mobile cells as proxies for general population mobility. And in early April 2020, Apple and Google joined the fray by making public their automated analysis of location data from their maps applications, showing how people traffic changed across time for dozens of nations.


This was useful because it gave a first approximation of the effectiveness lock-down and stay-at-home policies: that countries like Austria were able to reduce drastically the virus spread due to a widely adhered lock-down, but countries like Great Britain with a less rigorous lock-down policy fared much worse. And Apple updated its data analysis regularly, providing an important timeline of change.


It may sound simple, but it is an outstanding achievement that rests almost entirely on the availability of smartphones. Just think about it: the first iPhone appeared in summer of 2007, a good ten years ago, but third-party apps were only available to download a year later, in the summer of 2008. So if the virus had hit us fifteen years ago, we would not even have had the device to run the app to generate the data that yields the analysis of public mobility. We were blind then, and we can see now; at least a bit.


Essentially, of course, What Apple and Google did with map request data is the very idea that Google engineers had tried in 2008, when they took search requests sent to Google to predict the spread of the seasonal flu (another virus). The idea at Google was to automatically test hundreds of millions of mathematical models using both historical search data and official flu data. 


The system worked remarkably well (although the model does require frequent updating), but could not be used for COVID-19, because the virus was spreading during flu season with similar symptoms. Hence, using mobility, transaction and location data made a lot of sense.


A different approach was undertaken by a company in the US that had sold hundreds of thousands of digital thermometers that transfer their data via Bluetooth to a smartphone app. Understanding that COVID-19 often causes fever, the company analyzed hundreds of thousands of temperature measurements across the US gained from the app to see how the virus spread. Of course, not every fever data point was caused by COVID-19, but in the absence of real data it gave health officials a good sense.


Part 3: Theory

Brief: Two key factors – collect data and learn from patterns in data – help combat the virus; explain how this works technically

13:33/19:33 

So, in the very first weeks and months of the outbreak, the most important task was to better understand the virus, how it works and how it spreads. To that end, experts around the world harnessed their knowledge - and technology. But unlike before, this time, they had powerful new tools at their disposal to collect data and often times automatically learn from the patterns in the data.


Termed big data and artificial intelligence, or AI for short, these new technological tools greatly have aided humanity’s understanding of the virus. There are two crucial elements to the success: the ability to collect a lot more data with less effort, and the ability to learn from that data, often by the machine.


Our ability to collect a lot more data has to do with better sensor technology, and cheaper storage technology. Not all of the new, affordable sensors generate high quality data; but that doesn’t matter, if there is so much more data that’s being available from a variety of sources. In such cases, plentitude of data can trump quality.


Moreover, many of the sensors are built into technical devices that are widespread. Just think about it: A typical, modern car has dozens of sensors that capture everything from speed and location to fuel consumption and breaking habits. A typical smartphone collects data about its location, ambient light, vibration, acceleration, even battery temperature. 


This enables the car to operate and the smartphone to work properly; but the data can also be used for a very wide variety of purposes. For instance, the vibration and acceleration sensors of the smartphone have been used to track earthquakes as well as uneven subway tracks. And fitness devices and smartwatches have captured irregular heartbeats.


Importantly, we also have discovered that if we cannot collect the data directly, we often times can find a substitute, a so-called proxy. For instance, if one can’t measure the unevenness of subway tracks directly (because it is costly and disrupts service), a widely used app on smartphones of subway riders will do: it captures location and vibration, and although the quality of the data from an individual smartphone may not be not good enough, if data is aggregated across thousands or tens of thousands of smartphones, it gains in detail and accuracy.


The second element is the analysis of data. Data used to be analyzed by hand, using elementary statistics. Only about a century ago, statistics improved thanks to advanced mathematics, and only a few decades later, thanks to the calculating power of modern computers. Today, extremely sophisticated data analytics packages run on affordable laptops.


Part 4: Improving Decision-Making

Brief: The significance and shortcoming of AI only becomes apparent as we use it well – to make better decisions

17:24/19:33 

Many health authorities around the world also made the limited data they had available online. This created an opportunity for statisticians and data analysts but also AI and machine learning experts to use the data to train their systems. So pandemic models emerged that would predict the spread of the virus, as well as the likely rates of patients needing intensive care – a vital piece of predictive information for health agencies who have to decide where and when to allocate their limited resources.


Of course, not all nations fared equally well. Some governments, for instance the United States, initially disregarded data and its analysis. When they finally changed course, important time had been lost, letting the pandemic to claim more lives than elsewhere.


That, dear listeners, brings us right back to the beginning of this episode – and to the reason why we are using data-driven AI in the first place. The COVID-19 pandemic impressively demonstrates how at almost every stage of learning about the virus and its spread, Big Data and AI were hugely important. But it also reminds us that the real goal of learning from data is not to pave the way for the age of the machine, but – as I will explain in great detail in later chapters - to help us, to help humanity make better decisions in time.

以上内容来自专辑
用户评论
  • Lori_He

    刚点进来觉得这个专辑有点贵,但是后面发现真的好值得!不同于各种人工智能方面的书籍,教授用很贴近事实的热点来举例,比如阿尔法狗、比如多领国、比如这次疫情。大数据+人工智能从各方面应用于我们的生活中,真的受益颇多~

  • lbxgb

    喔😲