false
Catalog
ASGE Postgraduate Course at ACG: Innovative Practi ...
Directions in Artificial Intelligence for Endoscop ...
Directions in Artificial Intelligence for Endoscopy
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Speaking of hot topics, that brings us to our state-of-the-art lecture that's going to be talking about directions in artificial intelligence in endoscopy, presented by Dr. Tyler Berzin from Beth Israel Medical Center. Thank you very much. Thanks, everybody. Listen, despite all the hype around AI in medicine and gastroenterology right now, there is exactly one FDA-approved application for endoscopy. It's polyp detection. That's it. We'll talk about it, but there's a pretty big gap between the excitement and the future and what's actually happening right now. And so what I'd really like to do during the next 20 minutes is just to prepare you for the journey that we're going to go on together as a field over the course of the next 10 years or so. Those are my disclosures. So the agenda today is to talk about some of the AI point solutions that are immediately available. We're going to focus on the transition from AI point solutions to ultimately integrated AI platforms and then generalist foundation models. Our world of AI is complex because there is a huge number of terms that are very, very hard to disentangle, and most of us didn't learn any of these in medical school. But the way to distinguish the terms in general is to think about what the AI tool does. There are a variety of important medical applications ranging from computer vision to drug discovery to robotics, and then how the AI does it. This is where we start getting into very complex nested terms where AI is machine learning as a subset of AI, deep learning as a subset of machine learning, and we're going to talk a little bit about what these terms actually mean and how they fit together. The way that we're going to learn about that is with the dog or food computer vision challenge. This is a very classic computer vision challenge that as a human, if you look at these images, you can recognize that the picture of the chihuahua and the blueberry muffin and the golden doodle and the fried chicken, you don't struggle too much in distinguishing. In fact, I would wager that every one of you can get each one of these exactly right 100% of the time. You can also recognize that this is actually a very challenging problem if you're trying to train a computer to recognize these images. So let's talk about what we can learn about the approach to deep learning and machine learning using the dog or food challenge. Historically, this challenge was approached using a traditional programming approach where a programmer would sit in a dark room and write 100 lines of code, and that code would basically be if then commands to say if a certain number of pixels had this particular shape or color, then it's more likely a dog, and if it was X, Y, or Z, it would be more likely a food. And you'd write 200 lines of code, you'd try it, and you'd find that you're 60% accurate, and you'd be so angry and write 100 more lines of code, and that is essentially how traditional programming approached computer vision for years. This was even tried for colon polyps, just coders trying to describe what polyps look like. The big leap with machine learning is the concept that the human can work with a pre-made algorithm, an algorithm that is already built to recognize certain aspects of the world, and that algorithm can then be exposed to labeled data. And that labeled data can be incredibly voluminous. It could be 100,000 pictures, 200, a million pictures, along with a desired output or label. And when an algorithm is exposed to data, the result is what's called a model. So you'll hear the term model a lot. That basically means a trained algorithm. And here we can find a model that can predict whether it's a dog or not dog. The big leap, though, happens actually with deep learning. What I didn't tell you about traditional machine learning is that the human still has to provide some guidance about the critical features of the image. This is called feature extraction. So in traditional machine learning, we would sit down with a programmer and say, well, the snout and the eyes and the ears, these are important features. Let's make sure that the algorithm focuses on that. Deep learning does that feature extraction step on its own. It identifies the features that are important to its own algorithm on its own. And it may recognize features that we don't think are important at all, and it may recognize patterns that we can't recognize. And so the big excitement here with deep learning is that it can potentially recognize patterns that no human could ever possibly recognize. For instance, if we gave it pictures of 100 million chihuahuas and all of their medical and geographic history, a deep learning software might be able to recognize hypothyroid chihuahuas from New Jersey. We may not necessarily want it to do that, but it's going to be able to recognize patterns that are beyond human comprehension. So let's roll into some common AI myths. And, in fact, the very first myth that is deserving of its own talk and will get its own talk by Dr. Phoula May in a few minutes is the importance of – is that AI is free of bias. AI actually has incredible amounts of embedded bias that we need to lean into, and Dr. May is going to give a 20-minute talk on exactly that issue. All right. So myth number one, AI is more robust than human intelligence. So this is actually both true and false. AI tools are incredibly powerful to recognize these patterns I mentioned that we can't recognize as humans. A few good examples of this is that we can have AI tools that can recognize cirrhosis based on an EKG or anemia based on an EKG. These are not patterns that humans can recognize. Perhaps more troublesome, we have AI tools that can recognize or predict race and ethnicity based on chest X-rays or hand X-rays. And you can imagine how complex that may be if you're starting to use models that make other treatment recommendations based on having this sort of embedded data. The flip side, though, is that AI predictions may get totally broken by subtle perturbations. A classic example of this is an AI system that was trained at a university hospital. They tried to apply it at a community hospital down the street that had a slightly different system, GE versus Phillips or whatever, and the system couldn't detect a single lung nodule. If you think about a human radiologist, if a human radiologist is trained at, say, Stanford and then walks down the street to a local hospital, they'll be able to sit down in the dark room, and regardless of what the system is, they'll be able to do a pretty good job recognizing lung nodules still. There's a certain aspect of human intelligence that is incredibly robust. We can move in different environments very easily. It is a huge problem with computers that they may not be able to do that. Additionally, AI lacks common sense completely. This is actually the cause of many, many of the errors that ultimately are made by AI because they can be focusing on the wrong thing. A classic example of this is a system that was trained to recognize melanoma. It did it with very high accuracy, but subsequently the researchers recognized that in many cases it was recognizing that melanomas are usually marked with ink next to the melanoma. It wasn't trying to cheat. It just has no common sense, and so it ended up using the ink marking as one of the major tools to recognize whether melanoma was present. Myth number two is that AI is continuously learning and adapting. I've heard this from many physicians when they begin getting a computer-aided polyp detection system in their endoscopy unit. Is it going to be changing every day, adapting? On my practice, is data getting sent into the central company? The fact is that there are some protections against that. In the consumer space, there are many examples of AI which does continuously learn. When you dictate into your iPhone, when you dictate into any computer, when you drive a Tesla, all of these systems do have a continuously learning feature. Data is constantly sent to the company, and there's a flywheel of continued improvement. In medicine, there's actually strong protections against this. You do not want a system that you're using at work changing from one day to the next beneath your feet without you recognizing it. So the FDA actually only allows locked algorithms. Companies can update these algorithms, version one, version two, version three over time, but they can do this only if they have an approved, what's called a predetermined change control plan that's been approved by the FDA. And those changes cannot occur based on data that you're sending from your endoscopy unit. In fact, they can't receive data from your endoscopy unit. Those changes can only occur because of central libraries of images and videos and some of their own software upgrades. Myth number three, and this is the one perhaps we're all most worried about, the fully robotic colonoscopy is around the corner and we're going to be out of a job. What I'm going to ask you to remember is this concept of Moravec's paradox, which is one of the central concepts of AI. AI can easily outperform the high-level computational abilities of an adult, but it struggles to achieve the sensory motor skills of a toddler. The best visual example of this that I can think of is that Garry Kasparov was handily trounced by Deep Blue, IBM's supercomputer in the 1990s. But IBM had to have this engineer, I think his name was Joe, move the pieces. Why? Because Deep Blue would have knocked over the pieces if it had tried to play with the chess board. So there's a huge, huge gap between computational abilities of a device and sensory motor skills. When you start seeing robots pumping gas at the gas station or doing all of the shelving in a complex supermarket, we can start getting a little bit nervous about colonoscopy. But the gap to do that fully autonomously is actually very, very high. All right, so let's talk about how this fits into our GI for AI roadmap. The truth is that we're at the very, very beginning of this roadmap. We're mostly talking about right now, and frankly for the next five years, our individual point solutions. We've talked about computer-aided detection, we're going to get lesion measurements, we'll get computer-aided diagnosis, but all of these will be single-pass individual point solutions. Each one will get individual FDA approval, you'll decide piece by piece whether each one of them will be usable for you. But that is not the future. The future is much deeper integration into our endoscopy systems and ultimately broadly into our medical charts, and I'll show you what that will look like. But first let's talk about the point solution that's in front of us. Computer-aided polyp detection is our first AI point solution. We have at least three FDA-approved devices with several coming up. What is the point of thinking about using these? Well, I think the important thing to consider is that the story of computer-aided polyp detection really starts with human variability. We know absolutely that among providers, among gastroenterologists, adenoma detection rate varies. One thing that we don't talk a lot about is that our own adenoma detection rate varies from hour to hour and from day to day. As we get fatigued over the course of the day, as we've had a rough on-call night, as our child is sick at home, as we have a particularly chatty tech or nurse in the room, or loud music that Juan Carlos is playing, a lot of different things can alter our attention. And so we have incredible intra-physician variability in ADR. Even Doug Rex cannot be Doug Rex at every moment of every day. So there are three factors which contribute to performance variability in colonoscopy, three major factors. One is some limitations related to the human visual field, some interesting aspects of our visual processing power, and then a variety of other human factors ranging from fatigue to how attentive and just careful you are for any given patient. But let's talk about some of those specific factors. It turns out, despite what you may think, that the human visual field is not purpose-built for GI endoscopy. In fact, your central visual field, which has very, very high resolution, is just barely five or ten degrees, and then your brain does some incredible magic to fill in the rest of the scene, but it is a little bit hazy. And the issue that has happened, the sort of counterintuitive issue, as we get higher and higher definition scopes, larger and larger beautiful screens, is that we have more and more pixels to survey, and it may actually get harder and harder to make sure that we're capturing all the visual information. An AI system is able to look at every single pixel on the screen with equal attention at every second of the procedure. The other very interesting aspect of our visual capacity is that we're very good at a particular type of search task, which is called parallel search, something that stands out dramatically, and we're really lousy at what's called serial search, when something sort of blends in. It takes incredible computational effort for us to do serial search. A good example of this is that it's, if you look at this picture on the left, you can immediately take in the whole picture. You get, you understand what's happening, there's an apple and a tree, and you absorb it all. But God help you if your partner asks you to go to the bookstore and asks for a particular author and a particular title, you turn your head sideways, and it can take minutes or hours to just go through each individual title. We cannot take in that whole scene immediately. A penunculated polyp presents a parallel search task. We don't need help generally with parallel search tasks. But sessile polyps and other subtle lesions probably are something closer to a serial search task, and some highlighting or help can be incredibly valuable. It turns out that across all of the polyp detection technologies that have been presented on the market over the last decade or more, computer-aided polyp detection creates a bigger delta than any other. So the data right now is fairly clear. Dr. Chakot is actually in the next room, I think, talking about some of this work. But despite that, we're seeing lots of variability, or at least we're hearing about lots of variability in recent papers. It's worth noting that there are about 20-plus now positive studies for polyp detection, and about five or so negative studies. I think that we're going to see continued variability in the benefit of computer-aided detection in different clinical settings because the whole story is about variability. There are some settings where there may be less variability and less need for help. There's some physicians who have such aspirational levels of polyp detection consistently that additional benefit may not be measurable. But variability is exactly what our tools are intended to diminish, and our goal as a field should be to provide very, very high levels of performance, and our groups and our hospitals should be providing that consistently as well. We have to talk about the math, the economics as well. We're not able to adopt technology unless the math works in general for our hospital and our units. It's worth mentioning that Raj Kaswani at Northwestern did a very elucidating study where they implemented computer-aided detection in several of their rooms, rotating over the course of nine months, and what they found is that the computer-aided detection available in the room, actually even optionally used by the MD, increased ADR, increased other metrics of polypectomy, and increased collections by $68 per case, and the estimated monthly increase if they adopted it broadly was about $51,000. So the math works, and it does improve polyp detection even in expert groups. So despite all of this, frankly, adoption of AI polyp detection has been pretty sluggish. If you had asked a number of folks in the field three or four years ago, and absolutely if you asked industry, they would have expected very broad adoption by now, and it hasn't happened. So a handful of reasons for this. I think that we can summarize that there are three major reasons. There are lack of quality and reporting mandates. I will tell you I work at Beth Israel Deaconess Medical Center in Boston and Harvard Medical School. I don't know my adenoma detection rate for the last three months or the last six months because I don't have an automated way to compute it. I'm sure many of you in the room do by now, but I bet many of you still don't. We have lots of variability around the tech adoption curve, and then, of course, economic considerations, as I mentioned, are critical. I'd like to zoom out, though, for a second and say that another reason that adoption has not been broad yet is that polyp detection, again, is only a point solution, and we should think a little bit about what we can learn from other technology moments in history. A classic example of this is that Edison invented the electric light bulb in 1879. If you poll most human beings on the planet about how quickly electric lights were adopted, people generally would assume that it was almost immediately afterwards. The truth is that 20 years later, the household adoption of electricity was just 3 percent and in factories, about 5 percent. There was a gigantic gap between the invention and when adoption occurred. This gap across many technologies is sometimes referred to as the in-between times. It's the time after the demonstration of promise and before the transformational impact. In order for revolutionary technologies, including AI, to make a critical leap, they have to move from offering point solutions to enabling entirely new system solutions. So the best example in the factory is that initially, electricity was just plugged into existing steam factory designs, and it offered incremental benefit. The big leap is when electricity enabled us to entirely rethink factory design, create the assembly line. That's when adoption really took off. It allowed for the creation of new system solutions. So we have to think about the same for endoscopy. A point solution like polyp detection may not be enough, and it's not asking enough of AI, frankly. We have an existing system for colonoscopy right now that is complex and inefficient. We try to record a number of quality measures. Many of these are manual. There's a huge amount of time typed over the computer, clicking on computers. We then have to check with the patient again in five days after the pathology comes back to assign intervals for follow-up. A lot of this can be solved if we have a system solution for AI-powered colonoscopy where all of this polyp detection, diagnosis, landmark recognition, billing, inventory management is all built in, and we can complete our entire episode at the point of care, no need for pathology in many cases, and be done. So how do we start moving more towards AI system solutions for GI endoscopy? Well, what we're going to see is that AI tools will move beyond point solutions. This is an example of an integrated AI system that measures, not only identifies polyps but identifies the type of polyp, identifies secum reached, identifies the tool used, Boston bowel prep scores. This is all what is going to be required for auto-documentation, and I don't think I could find a single physician that would not be thrilled to have some of our documentation burden removed. As we move into other complex areas ranging from third space endoscopy, we're going to have tools that can highlight, in this case, the submucosa and blood vessels to provide some guidance. I was talking with Dr. Shaw and others before about the roles of AI coaches for learning complex procedures. On the right, we have a video of an AI tool that is identifying key AI anatomy during an endoscopic ultrasound. So a key limitation of our current AI tools essentially is that they're trapped in a single modality. They're just focused on video or images, as an example. Health data is not like that. We are not trapped as physicians. We know all about our patient. We know their medical chart. We know their history. We know their family history. The future state is going to involve AI tools which interact with these multiple data sources, and these tools are called generalist medical AI models. There's a potential for AI tools to act essentially as a copilot or a colleague. I want to just pause on the term for a moment. Generalist medical AI uses foundation models. You may have heard about foundation models because CHAT-GPT is a foundation model. Right now, it's a language-focused foundation model, but it actually can interact with multimodal data. So foundation models basically serve as foundational tools atop which a number of innumerable applications can be built. So a foundation model in medical AI could get contextual data from your EHR, including the patient's history, and then during the procedure, a finding might be seen on the screen, and the physician could say, hey, what's that on the left, do you think? And the system could respond, well, the object represents an arterial wall. It's probably given the patient's history in aortic duodenal fistula. So especially for things like this, rare findings or something that a physician may not encounter, this may be of incredible value. So next five years, we're going to see a proliferation of point solutions. It will move into integrated platforms and generalist foundation models. But the question today is, should we adopt AI polyp detection? Do you have to do it in 2023 or 2024? And I'll just say that I don't have an incredibly strong opinion about it, frankly. I think we're in a middle zone right now. But I think that the decision hinges on three things, considering aspects of quality, economics, and innovation. Quality, how variable is our current performance, ADR, PDR, sessile lesion detection? Can we protect our patients and our physicians by reducing this variability? Economics, can we prove out return on investment from increasing polypectomies, or at least net neutral? And then finally, can adopting AI now prepare us for the future as a leading-edge, innovative GI practice? I'll thank you for your attention in there.
Video Summary
Dr. Tyler Berzin from Beth Israel Medical Center gave a lecture on the future of artificial intelligence (AI) in endoscopy. Despite the hype around AI in medicine, there is currently only one FDA-approved application for AI in endoscopy, which is polyp detection. Dr. Berzin discussed the gap between the excitement and the actual implementation of AI in the field. He explained the different terms related to AI, such as machine learning and deep learning, and how they are used in medical applications. He used the example of a computer vision challenge to illustrate the difference between traditional programming and machine learning approaches. Dr. Berzin also addressed common myths about AI, such as AI being bias-free and continuously learning. He emphasized the importance of understanding the limitations and potential of AI in order to make informed decisions about its adoption in endoscopy. Overall, Dr. Berzin discussed the current state of AI in endoscopy and the potential for its future integration and advancements in the field.
Asset Subtitle
Tyler M. Berzin, MD, FASGE
Keywords
artificial intelligence
endoscopy
polyp detection
machine learning
medical applications
limitations of AI
×
Please select your language
1
English