03【原声】疫情特别节目 | Big Data and AI-based Approaches to Vaccines

03【原声】疫情特别节目 | Big Data and AI-based Approaches to Vaccines

00:00
16:37

Part 1: Introduction

Brief: The Race is On, but it’s a Marathon, not a Sprint

00:00/19:42 

Hello listeners of Himalaya and welcome to this third episode tracking the role of Big Data and AI in fighting the SARS-COV2 virus pandemic.This time we are looking at vaccines, and how Big Data and AI have greatly aided the work of experts around the globe. Let’s first take a look back at the early days of the pandemic.Do you remember these first weeks?

 

On January 10, 2020 only days after the first virus clusters were identified, the complete genetic sequence of the new SARS-COV2 virus was published. On January 13, only three days later, the first vaccine candidate had been produced by Moderna, a US biomedical startup. It was the beginning of a frantic global race to produce vaccine candidates and get them ready for testing.


In early March 2020 Moderna’s vaccine candidate began human trials, just as Chinese company CanSino announced that it, too, would start human trials with its vaccine candidate. By May 2020, the World Health Organization listed over seventy vaccine candidates around the globe that had been completed and were at various stages of development and testing.

 

This is an amazing accomplishment for humanity. The idea of vaccination to help the human body fight an infectious disease by prompting an immune system response has been around for centuries. About a thousand years ago, in China people deliberately infected themselves with a small dosage of smallpox to weather a pandemic. 


But even by the end of the 21st century, as humanity had conquered smallpox and polio, rubella and measles, and much worse, the development of vaccines was still a complicated and costly endeavor. It often took a decade, if not longer. When less than twenty years ago a measles vaccine made it through testing in four years, it was seen as an absolute world record – and simply unthinkable not much long ago.

 

Even four years, however, for a vaccine to combat the SARS-COV2 virus would almost certainly translate into tremendous suffering for humanity. I am just thinking of what the first couple of months of the global pandemic have meant for me. For weeks, my family and I were quarantined at home. We were not allowed to leave our village. 


We love to go to a restaurant for a family meal, but all restaurants were closed. It also meant a huge reduction in the socially contacts that I treasure. I was able to use videoconferencing to teach my students and meet with them, but it is not the same as sitting across each other. And I wasn’t able to travel, and to see my colleagues and friends in China, for instance, which I miss very much. 


And of course, there is the economic side as well. Overall, it hasn’t been unbearable, but these couple of months surely were difficult. You probably feel the same way. So waiting for four years for vaccine to become available would take a huge human toll on all of us.

 

Unfortunately, developing a vaccine is a complex undertaking. First, one needs to understand the enemy and uncover its weaknesses. Second one has to devise a way to exploit a weakness to stop the virus. Third one has to get that mechanism to where they matter – into the human immune system. And fourth, of course, one has to ensure in ample and comprehensive testing that the vaccine is safe, actually prompts the destruction of the virus and does not lead to other illnesses. 


Vaccines, like all medication, has to go through various stages of testing. Each testing stage takes time; it needs to be prepared, one has to wait for human subjects to develop an immune reaction, and for the results to be measured, analyzed and documented. After all, once a vaccine is approved potentially hundreds of millions, if not billions of humans will be inoculated with it.

 

Because there is fairly little that can be done to speed up human testing, most time if any can be gained earlier in vaccine development – the first three steps I just outlined: understanding the enemy, exploiting its weakness, and anchoring that mechanism in the human immune system. And it is exactly in these three steps that researchers around the world have benefitted greatly from the rise of Big Data and AI.

 

Part 2: Knowing what is to Know - Quickly

Brief: Big Data and AI are Crucial at Every Step of Vaccine Development

05:36/19:42 

As I mentioned in an earlier podcast in this series, the first task for researchers is to get information about the enemy – the virus. This begins by imaging the virus to see its unique shape and how it interacts with human cells. The SARS_COV2 virus that causes the current pandemic, is tiny, only about 100 nanometers in size. A nanometer is just a millionth of a milimeter! 


Because viruses are so small, one does not only need a special electron microscope, but the massive data that the microscope generates must be analyzed interpreted and visualized. It’s applied Big Data, of sorts. One also needs to sequence the genetic information of the virus. That is not just a task for biochemists, but for computer scientists. Because we now use AI to assemble genetic sequence information, researchers could sequence the virus so quickly in early January 2020.

 

For comparison, can you imagine how long it takes a computer to assemble the genetic sequence of a person’s human genome – four billion base pairs of information? Just twenty years ago it would take a decade – ten year for just one person’s DNA! 


By using graphics processing units and repurposing them for AI and Big Data work, chipmaker NVIDIA was able to reduce the time needed recently to less than 20 minutes. That’s a gigantic speed increase – and translates into precious time gained when fighting a pandemic.

 

Equally important to knowing about your enemy is also to know what other researchers have already discovered. The current virus is a novel one, but it belongs to the family of corona viruses that are well studied. Many thousands of research papers on them (and some of the mechanisms they employ) have been produced. 


In the first months of the pandemic, about 5000 new papers on SARS-COV2-related matters were made publicevery week,translating into more than 50,000 papers just in the first half of 2020. Nobody, not even the most knowledgeable researcher in the field, can profess to have read and studied all of them.

 

That’s where Big Data and AI come into play, again. A coalition of leading research groups have produced the COVID-19 Open Research Dataset (CORD-19). It includes over 100,000 scholarly articles relevant to the pandemic. The Allen Institute for AI has harnessed AI tools to make this dataset semantically and visually searchable, which delivers far better results than an ordinary text search. 


It’s similar to the difference of searching for a book by words in the title and browsing through the actual stacks of books in a huge library. When you just can search for a word, you get the titles with that word in it, but you will miss all books that are about the same subject, but perhaps use other words. 


In contrast, when you are browsing in a library and find a book, you can also easily look to the left and right of the book and discover books you otherwise would never have found. Semantic search works the same way – it helps us find related research even if the words used in the research are different.

 

Another example is online platform Kaggle offers not only access to this dataset for researchers and data scientists around the world, but also concrete cash prizes for small but important problems that need to be solved. 


And this is not the only initiative: world-leading journals, such as SCIENCE and NATURE, have opened their treasure troves, and so have international research platforms, such as ResearchGate, offering almost 40,000 research items (usually recent papers) that are relevant and easily searchable using semantic graphs and thus AI-driven search tools. 


The hope is that researchers armed with Big Data and AI can tease out of these huge corpi of knowledge the morsels of truly valuable insight that speed up vaccine development.

 

If the enemy is known, and the general mechanisms to attack it, the next step is to take these general mechanisms and make them work against this particular virus. This is the domain of protein folding. Through gene sequencing we know the genetic information that makes up DNA and RNA, the information repositories of humans and viruses. That’s really helpful. 


But what the information does not contain is how the proteins that make up most cells are built using this genetic information. It may sound trivial, but it isn’t at all. A typical human cell contains over a billion of protein molecules. These proteins are also the basic building blocks in viruses. Every single one of these proteins is a large, complex molecule, and it is folded in a very particular way. And the bigger such a protein is, the many more different ways it can theoretically be folded. 


Think of just having a sheet of paper, and wanting to fold a bird, or dragon, or a boat – how would you do it? And that’s only one sheet of paper, not billions of proteins! If one only knows the molecules of a large protein but not its exact three-dimensional structure, it would literally take longer than the life of the universe to try out each possible configuration to find the right one. Yet, in nature proteins fold naturally in just a fraction of a second.

 

So in order to understand a virus fully, we need not only the genetic information in it, but also how the proteins that the genetic information will lead to are folded. There are three different strategies to achieve this. The first is through experiments, but that takes a lot of time. And time is precisely what we are missing. The second strategy is by tapping into an existing digital protein data. 


One such data banks exists for well over 150,000 protein structures available that researchers can tap into. That saves time. But in the SARS-COV2-virus novel proteins were found, for which no entry in the data base exists. And so that leads us to the third strategy, where AI comes in big. One of the world leaders in machine learning, British company DeepMind in early 2020 unveiled AlphaFold. It is a deep learning system that canpredict how an unknown protein is going to fold. 


It does this by learning from automatically examining a very large set of existing protein structures. The AI deduces the hidden (and complex) folding principles that way. This can help vaccine researchers around the world to more quickly identify structures of the virus that can be used to identify and to attack it. It’s like having the capability to produce an x-ray with only a scant description of the patient rather than the patient herself. It has the potential to save a lot of time and effort.

 

AlphaFold is the result of seven years of research and development. It is not perfect, and while arguably very good it is not the only such tool available to researcher. In March 2020 it was open sourced and made freely available to everyone.

 

At every twist and turn of developing a vaccine, big data analysis and AI have shown to be crucial to speed up the process. If humanity has a chance to best its record of vaccine development and do it under four years, it is because we have these capabilities at our fingertips.

 

Part 3: Beyond Vaccines

Brief: We need a vaccine, but we also need medication

15:14/19:42 

But even if we eventually will have found a vaccine, have successfully tested it and found it to be effective and safe for humans, we will have to produce it in large quantities and inoculate billions of humans. That will not only take time and require a huge effort; it will also work not for everyone. 


Each vaccine has a small number of misfires – of cases in which it does not work as planned, causing anything from a severe negative reaction to no immunization. In the long run, therefore, we will need more than one vaccine, and – should the virus mutate substantially – also keep pace with the virus, develop improved or alternative vaccines and keep inoculating.

 

So the virus will not go away fast, it will stay with us for the foreseeable future. This means that people throughout the world will continue to get sick and suffer from COVID19, although hopefully far fewer people continue to get sick than in the early weeks and months of the pandemic. To help these patients, we need not only an effective vaccine (or multiple ones), but also an effective treatment for the disease the virus is causing. So in addition to the race for vaccines, there is a parallel race going on to find a drug that fights COVID19.

 

Because of the extreme time pressure, scientists looking for suitable drugs against COVID19 are also focusing on substances and ingredients that are already known rather than crafting something completely new. Fortunately, here too, there are large data sets of millions of known compounds, and equally fortunate there are AI tools that help researchers going through this huge pile of compounds to find the most likely candidates for a drug. 


For example, the Argonne Laboratory in the US uses AI technology to screen a billion – one thousand million – possible drug combination every 24 hours. It’s such a Big Data and AI driven approach to generative chemistry that will likely yield the first truly effective COVID19 drug.

 

Part 4: Summary

17:56/19:42 

The global SARS-COV2 virus pandemic has deeply challenged and truly taxed humanity – our societies, and our economy. It has killed hundreds of thousands of people around the world in the first couple of months already, and deeply affect the health of many millions more. It is an enemy humanity has not faced in a long time.

 

In the past three episodes I described how the power of Big Data and AI helps us to conquer the virus – every single step of the way. But the power of Big Data and AI isn’t limited to conquering the current pandemic. It goes far beyond that and will change every aspect of our economy and our daily lives.

 

If we use this power the right way, Big Data and AI will let us lead better lives. But importantly, it will not be the beginning of the age of the machine. Instead, Big Data and AI are enormously powerful tools for humanity to work jointly more peacefully and prosperously. We can make it happen - together. So if you enjoyed these three episodes and are curious about our future in the age of AI, come along with me for a journey of exploration, astonishment and learning.

以上内容来自专辑
用户评论

    还没有评论,快来发表第一个评论!