false
Catalog
ASGE Annual Postgraduate Course: Clinical Challeng ...
Session 9 - Keynote Lecture - AI in Medicine 2023
Session 9 - Keynote Lecture - AI in Medicine 2023
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I would like to welcome our next speaker and our keynote speaker, Scott Penberthy who works in the office of the CTO at Google Cloud, and he will tell us where we are in terms of the future, maybe in two years, actually. So I have 75 minutes for this, is that right? I hear there's a little lag in the clicker. So when you asked me to come in, I work for Google, I'm in the CTO office, I've been doing AI, I guess, in the late 80s, early 90s, back when it didn't work. And more recently, some big things happened about 10 years ago. And so if you think about 10 years from now, my God, I mean, there's probably 200 to 300 papers a day in this space. I barely read two. And so I'm a student of AI, like most of us, if you're trying to think about like 10 years, how do you focus? Because it seems like everyday glass AI, amazing examples. And so a great science fiction author once said, the future is already here. It's already here. It's just not evenly distributed. And so, and since we have exponential change, if you look back 10 years, 2013, 2013, a year after CRISPR, a year after we figured out how to actually perceive in Google, and perceive in machines, and we were excited in 2013, because we ran AI for three months, it came back and said, we found something extraordinary in YouTube. And we want to show a board, what would you find, you don't love this, it found cats. And they're like, what do you mean you found cats? And so that was 10 years ago. And so today, what I want to do is walk you through two fundamental things that are happening, give you a sense of what those are, and what all these thousands of humans are chasing. And that may give you a sense, and maybe you'll help us invent what's actually coming. This picture is from 2012. And in the middle there is something called a convolutional neural network. You've probably seen this before, and it's all being done today. And soon will be potentially FDA approved. Back then, you take a picture, and you flow it through these cells. And all the cells are doing is adding a little, multiplying a little, and pushing it up to another plane. And then as you add and multiply, that's the weights that you're learning. And eventually gets to the top, nice and small. And one little line of light says cat or dog. That's how that worked. And so the training is you push an image through, the weights are all wrong, it says dog, so no, it's a cat, you adjust the weights and you iterate, that's the learning. Well, it turns out the technique to win Atari games, 50 games better than humans, is the same technique in self-driving cars. Because what we're doing is we're just doing perception. That's a part of the brain, part of the neocortex, which is you're predicting what's called a probability distribution. In other words, across a thousand things this could be, where do you vote? That's the probability. That's all you're doing. 2012. Something happened about five years later. You see the thing on the left, this is very common in genomics called a correlation ball. What this is, is called an attention network. And what we said is you said, we used to be able to look at sequences of things, not one probability, but a sequence of probability distributions, much like playing the game of chess or Go. And so we said, the first thing you want to do is figure out, what do you pay attention to? And what you want to do is what you want to pay attention to is also a function of the data you're looking at. So it's almost like you have a book that changes every time you change the prompt. So if you change the prompt, the book of what to look at changes. So now you look at your data, it says, oh, here's a new book just for you. You look at, and you look at a book, the first word is the, you look in the index, it's okay, that's different. Look in the page, that's different. The value, that's different. That's four. It goes up by end of the four. That's the attention network on the left. The thing on the right is called diffusion. And this comes from physics. And what we did there is we took an image and you keep adding what's called white noise, Gaussian noise to it in a thousand steps. And eventually you get pure white noise, like an old television. Then we say, well, what happens if you play that backwards? This is like the poltergeist movie, right? You play it backwards, they're here, and what you see is outcomes, the original image. That's all you're doing. And so between the two of these, these were techniques not to learn one probability distribution, but to learn a sequence of them. So not just a cat, what comes after the cat? Or it's a movie sequence, what's coming next? And what we found is that this sequence is general in life. And what we think it's doing is learning what you might call a human language function. So this is from the caves in France, the first visual language. That's a language. People say, well, that's not a sequence. Well, yeah, it is. If you tile it and read it from the upper left to the lower right, you get a sequence of tiles. That's a tile. Pretty neat. Dance, that's a language. Music, mathematics, all these are language. And we actually write them down. And so what the machine is essentially doing is it's trying to figure out given an utterance that's text, image, dance, video, what's the most probabilistically correct set of things that come next in that sequence? For medicine, it might mean if I give you a question or an exam with a picture, what's the correct answer? Because that's looking at all written human history. And all this is doing is estimating the human language function. And so when you estimate, sometimes you make mistakes. It's not quite perfect. That's what we call hallucination. It's just where your estimate is off. And what we find so fascinating now, we would call these emergent behaviors, is the way you program these, the way you program these is no longer Python, Lang chain, and that's your new assembler. That's the equivalent of what you do on a spatial. Now you're going to program these in human language. On the right is actually programming a robot by literally doing cartwheels. That happens in research. On the left, this comes from Dolly 2, and we have things like that where you actually take images, and it's learned from text how to program that called the prompt. And once you have this, it's a whole new way to compute. Is it not working? Oh, there we go. And so we're finding though, is that language is a general purpose thing for us as humans. And here's just six examples that are coming from the, they're in research today, you're going to start to see these in the market very soon. So the upper left is what we all see. That's BARD, that's chat GPT, that's text and chat. To the right, well, code, that's writing code. That's just a different language. It's actually much simpler. So that's coming next. Images you've seen before, that was just last summer. And the lower left is now dialogue. It's a special form of language where it's give and take back and forth, has like things like utterances in their body language that's coming. So a lot of things you do in conversation is now multimodal, and it'd be videos now part of that. That gets you into audio and music, likely you can see that more. It's in research today, probably in the fall, you'll see stuff like that. And then video, these are just sequences. And what that gives you is basically, it's called a foundation model, which means you train these AIs on millions, if not billions of examples of these, and you create a fundamental model that basically understands that particular what's called modality of expression. And what makes these things different is that, see what it says, FM's foundation models are characterized by emergent abilities. What this means is it seems to do something you haven't trained it on. Essentially, what you're doing is if it learns the human language function, we're trying to figure out what languages do you give it to actually get the answer that you're looking for. And so that seems to be emergent because, wow, this seems pretty interesting. We didn't train it on this, but it knows the answer. No, what you've done is you've created something that can understand human language, and it's just another input. Isn't that neat? And what's interesting about this is this began about five years after CRISPR. And it was that thing called the transformer, you've probably heard about that. And this is just a chart of the different models. And you'll have this to look at later. What's interesting to look at is look at the density of the dots in the upper right. That was from 2023. The rate and pace of innovation is unprecedented. It used to be Moore's law was doubling every 18 months. Then it was every year. Last summer, I guess the beginning of last year was every three months, we were doubling the capacity. It's now getting down to measured in weeks. So if you look back 10 years in 2013, that was a factor of 1,000. We could recognize cats as images, 256 by 256. The latest iPhones 8,000 by 6,000, a factor of almost 700, almost 1,000. So we're seeing now the rate and pace of change is going to be astronomical here. But basically, what it's doing is trying to get better at understanding a human language function. And we believe that's going to be pervasive in the next decade. Is this working? It is? How about that? So what you'll see is a flurry of activity. Despite the economy, there's an awful lot of cash chasing after these days. The charts are all going to the upper right. What you see on the right is just, I found some plots for what I track, which is archive, A-R-X-I-V, or bioarchive, or do PubMed. And what you're finding is people are now putting research on archive. You can download the research. You replicate it because it's on GitHub or GitLab. You then republish before the journal has a chance to get you back your peer review. We're finding this can be three to four deep now. In other words, research is published. Someone takes it, replicates it, improves it, republishes the next one. Someone else takes it. That's three or four generations before the NeurIPS conference happens. So that's fascinating. So if you feel like, how do you possibly keep up? You're not alone. How about that? It's good. And so if you think about use cases, what I love about the examples you showed, they're fascinating. These were dreamed up. We were looking at cats, thought that was cool in 2012, 2013. And now you're actually able to do this to help patients and improve quality. It's fantastic. And you're showing videos and chat GPT and whatnot. What we're finding, though, is that a lot of AI may not be about the diagnostics. We've loved that for years as AI practitioners. We're finding that it makes your jobs a lot more rewarding and spend more time with your patients and less time on admin. And a lot of the stuff is, how do you prepare cases? We were just talking about this before. How do you get this stuff ready for you? How do you search the relevant information? How do you summarize all the clinical trials that are going on? How do you summarize all the research, all the interactions you've had for that person? Can you bring that to bear? And so that's like care team support. So prior authorization, help you draft those. Where the AI may not be perfect, but boy, after dinner at night, you want to play with your kids, you're trying to do a PA, it can generate a rough draft that you can then edit and then publish about one tenth of the time. Same thing for clinical notes. That's care team support. We're finding for patient engagement, I have to do a GI exam. I'm a little nervous. Well, here's a PDF. I don't have time to read it. Besides, I speak Spanish. No problem. You can now use, as you saw before, you can ingest a document in Japanese, talk to it in Spanish and get an answer back in Spanish. So that's a lot of, again, the paperwork around this. I like to joke that AI sends the facts and AI reads the facts, right? And humans need to either side of this. In research and development, we're finding recently now using diffusion models, combining both of these, we can do such things as imagine the structure of a molecule that might be able to attack a particular antigen, for example, in cancer treatment. And so we're finding that a lot of pharma companies are now looking at AI as what I call the idiot savant. The AI says, how about trying this molecule? And what it allows you to do is say, that's an interesting idea. Let me go test that in situ, and I'll actually do my actual tests, but allows me to reduce the space I have to explore by multiple orders of magnitude. And so that's, again, for R&D. And on the far right, we're finding governments very interested in this, because wouldn't it be nice to answer the phone every time someone calls and be able to express and multiple languages and actually take the cost of providing government support and everything else in multiple languages on the phone, multimedia by video for pennies on what used to cost dollars to real humans and take a lot of costs out of government. And so I'll walk you through three ways that your vendors or yourselves may play with these. One is there's a brand new stack being built by these hyperscalers, what I like to call neural clouds. These are where you're programming now in human language, not computer language. And what they're doing is they're using new types of chips. It's basically, it's a resurgent in chip design, where they're now trying to design something that allows you to run these particular types of workloads at scale, much like a Mac is completely integrated vertically. That's one way to buy that, is buy it through a cloud. The next thing you might do is have machines or have your grad students use something called an API. And what that allows you to do is basically, we have one called Palm Sensor Pathways Language Models, it's just an internal name. And here, the idea is that you can actually have your code actually integrate this using something like LangChain or their techniques. So now if your programmer knows a couple lines of code, and very soon just English, you can integrate this with your existing tools. For example, these are the plugs that allow you to do summarization on top of EHR. Thank you. You're better than the mouse. And the two things to think about, we talked about access through a cloud, I mean, to use your browser, it means an API. But two things we find most people now start thinking about, if you assume all that, you assume we'll have a machine that can approximate the language function. It's fantastic. It's almost like a machine that replicates a lot of the neocortex. It misses a lot of the emotion, a lot of the complexity of the human, but essentially, it's a really powerful tool. Like any tool, what it comes down to is, how about the data itself that you feed these things? How are you governing the data? How are you protecting the data? Who's allowed to have access to the data? Where's the data? What's the provenance of the data? And it was where it came from. And how do you protect that data? And so a lot of this is, assume the AI is going to happen. It's happening very quickly. A lot of the work is, where do you get the data from? How do you access that? And how do you control this in a way that we feel comfortable? In other words, the data is being used in a correct way. We have some approaches there at Google we'd love to share with you about things like, for example, in Europe, something called GDPR. A patient may come in going, I'm out. I'm out taking my data out. How do you do that? How do you do that in a privacy safe way? How do you look at data so that if you share data and want to try a new tool, that you can actually share synthetic data that doesn't expose someone? Even though you think you de-identify it, they say, well, they have to be a certain race or a certain age. It may come out that way. And then I think also you want to do it responsibly, and I think also out of time. So what I want to do is just, in summary, leave you with is that if you look back 10 years ago, we were just playing with very small models that are now recognizing things in images. And so today, that's becoming almost standard of practice. If you look 10 years from now, we're now playing with machines that can actually approximate the human language function in multi-modalities. That'll be very commonplace, and we hope to be very, very low cost as well within the next 10 years. And a lot of the stuff you spend time on today, the administrative, everything else, you'll look back and say, how did I ever do my job without something to help me through that? Thank you.
Video Summary
In this video, Scott Penberthy, who works in the office of the CTO at Google Cloud, discusses the future of artificial intelligence (AI) and its impact in various fields. He mentions that AI has rapidly evolved in the past decade and predicts that it will continue to progress exponentially in the future. Penberthy highlights two important developments in AI: convolutional neural networks and attention networks. These techniques have enabled AI systems to learn sequences of probability distributions rather than just recognizing single objects. He explains that AI models are now being trained to understand human language, which will have numerous applications across different industries. Penberthy also emphasizes the importance of data governance and privacy in AI systems. Overall, he believes that AI will greatly enhance productivity and improve various processes in healthcare, research and development, patient engagement, and government support.
Asset Subtitle
Scott Penberthy, PhD
Keywords
artificial intelligence
future
evolution
convolutional neural networks
attention networks
×
Please select your language
1
English