false
Catalog
ASGE Annual Postgraduate Course: Clinical Challeng ...
What is Artificial Intelligence, Machine Learning: ...
What is Artificial Intelligence, Machine Learning: Terminology, and general concepts
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
So, it's my pleasure to start with the first session, which is called the Basic Concepts of Artificial Intelligence in Endoscopy. And the first talk is on what is artificial intelligence, machine learning, terminology, and general concepts that all of us keep either hearing about in lectures or reading in our, you know, publications. So, again, Tyler, great to have you here. Good morning. And please start, Tyler. My task in the next 15 minutes is just to begin introducing some of the terminology that's going to be explored later in the rest of the day. And I'll first move to my disclosures. So, the general agenda that I'll try to accomplish in the next 15 minutes is to introduce the concepts of AI versus machine learning versus deep learning. We'll talk about two very specific computer applications, which will be explored later in the day, computer vision and natural language processing. And then I want to zoom out for a little bit for all of the excitement and then explore some of the ways that AI can fail us and some of the things that we need to pay attention to as this field develops. So, the first, just starting with the definition, artificial intelligence refers specifically to computer systems and algorithms that can perform tasks, which normally would require some aspect of human intelligence. And that might be visual perception or speech recognition, decision making. And it's really integrated now in our worlds all day between our smartphones or in some of our cars. And I tell people that in many ways, aspirationally, AI should feel like magic. And in some aspects of our daily lives, it really does. I don't think we're quite there in terms of whether it feels like magic all the time for our medical practices, but I do think there are opportunities for it to make pretty profound improvements in our clinical work. So, the reason that we're all talking about it and the reason that AI has been such a focus in the last five years is that something really dramatic has happened in around the last five years. AI was invented long ago. The concepts were present in the 50s and 60s. But what has happened just in the last five years is that for very specific tasks, whether it's playing chess or speech recognition, or in some cases, driving a car, for very, very specific program tasks, AI has just now reached the ability to meet and exceed human performance. And I really want to make the point that, again, this is just for narrow or task-specific AI. Almost all the AI we're talking about in the world is task-specific, very narrow AI. Usually, AI articles in the press will be accompanied by pictures of our robot overlords. And what they're referring to is this dystopian future concept of a generalized AI that really can mimic human intelligence fully and think for itself. And we are extraordinarily far away from that. So, all of the stuff that we're talking about, as exciting as it is, is very limited to specific tasks as designed by human beings. So, we're still the overlords of the AI. One of the things that gets very confusing about AI is that the field itself is built with incredibly overlapping terminology. So, we talked broadly about the term artificial intelligence, but there are many subsets of artificial intelligence. Machine learning is a particular subset, and I'll show you an example of what that looks like in a moment. And then deep learning is a subset of machine learning. So, these concepts are all overlapping. Additional overlap is created by some applications of AI. Computer vision is one of them. And computer vision, which allows computers to see and interpret the visual environment, can be accomplished through AI, can be accomplished through specific aspects of AI, machine learning or deep learning, and it can also be accomplished without AI. And so, this is the reason we see all of these overlaps. When I think about AI terminology, another way to classify it is to think about what is the AI accomplishing? It's accomplishing computer vision or polyp detection, perhaps it's accomplishing speech recognition. And then a separate concept is how does the AI accomplish that? And there's often underlying algorithms that allow that thing to occur, and that algorithm may be based on machine learning or deep learning. One of the challenges with the way this world of AI is represented is visually, it's often cleaner to show the diagram on the left with these sort of branching concepts of machine learning and robotics and natural language processing and deep learning and so on. But the reality, it doesn't look anything like this. The reality is that there is incredible overlap between the what and the how, how it's done. And so the field really is more accurately represented by a figure like this on the right as opposed to a figure like this on the left. So I'll just start with a little introduction to the concept of machine learning and deep learning, and some of my colleagues today will dig in a little bit further. So, so machine learning is best represented by this image on the top. If we're trying to identify, to teach an algorithm to recognize a car. In machine learning, the way this typically works is that a software engineer will build algorithms that can identify certain features of a car. For instance, it will highlight looking for a wheel or a windshield or a particular length or size of the car in an image. And it will put each one of those features together and design an algorithm that over a number of iterations can either output that the thing in the image is a car or it's not a car. And what happens over time is that the engineer goes back and tweaks the performance of that algorithm and tweak some of the inputs. And eventually, you can get very high accuracy to say this is a car, this is not a car. Deep learning takes things one step further. And in fact, it involves quite a lot less input from the software engineer. In supervised deep learning, the engineer will typically just provide labeled images of a car. This is 1000 images of a car or 10,000 images of a car. And this is 100,000 images with no car. And then a system of what are called artificial neural networks will extract key visual features. They may not be the key visual features that you or I would think of as human beings. In fact, you can't even really query the neural network to identify which features it's identifying. Is it identifying wheels, or headlights? Or is it some other features of cars that human beings don't usually think about? But in any case, a neural network goes through a number of processes and eventually learns how to output car versus versus not car by essentially sort of training itself to say, well, I did this and I got I got the answer wrong. Let's go back. I did it again, I got the answer right. And now I'm getting the answer right most of the time. So there are many rounds of trial and error that occur internally within these neural network layers. But it is something of a black box. And one of the challenges in deep learning in particular, is that it's very difficult to query what the software was recognizing. And there are really interesting parts about this. For instance, when we're talking about polyp recognition, if we get to a point where AI is dramatically better than human beings at polyp recognition, and we're probably moving in that direction, in the present technology, we're not going to be able to query the AI to say, well, what are you seeing that I'm not seeing? So there may be future versions of deep learning that can actually teach us what what we're missing. But a lot of deep learning right now really is very much a black box. And so the computer does its thing, and identifies its own important features. One other myth that I want to just address, and it's a very common myth is there's a general concept that AI also means the system is continuously learning. And I've been asked this frequently by gastroenterologists who are thinking of getting polyp detection systems in their units. And they often have a concept that the more colonoscopies they do in their unit, the better the system will progress and work. And in the consumer space, there are actually many examples of AI which continuously learns. Tesla's auto driving neural network is progressively getting better with every mile a car and the whole fleet drives. And so it is actually continuously learning and getting better. But in medicine, there are a lot of precautions that are in place for patient safety and because of the regulatory environment. And so the algorithms that we're all using in medicine right now, whether it's polyp detection or for chest x-rays, are what are called locked algorithms. And so when an algorithm arrives in your unit, it's not going to be different tomorrow, the next day or next week, except that the FDA will now allow what's called a predetermined change control plan. And what that space specifically means is a manufacturer can eventually update the algorithm, not based on what's happening in your unit, but what's happening at the manufacturer as they as they develop better algorithms. And they can release updates to the algorithm in a controlled fashion once they have demonstrated to the regulatory agency, the FDA or another, that they have a safe and thoughtful process in place to do that. I don't think we know quite yet as physicians what what that's going to look like on our end. In other words, if we have a polyp detection system that's working well for us for three months, and then the manufacturer decides to to do an update through the predetermined change control plan, are we going to get a notifier of this with every version, version 1.2, version 1.3 and beyond, or not? I suspect we will be notified of this, but this is all so new that the systems are just beginning to develop. I want to take a step back and just acknowledge that the concept of polyp recognition in general, polyp detection in general, was something that's long been in our field. And the solutions to this predated attempts at using AI. Some of the earliest attempts used a concept called hand-programmed feature recognition, which to sort of dumb it down was an engineer and a gastroenterologist sitting in a room with the engineer saying, well, what does a polyp look like? And the gastroenterologist saying, well, it's a little bit round or protrudes a little bit, or it's a little bit redder than the surrounding tissue. And then the engineer would just program into a system to say, identify round things, identify red things. And this, not surprisingly, was not incredibly robust. It was only in fact usable on still images in the early days. And the algorithms and processors and computers were just not fast enough to keep up with live endoscopy or for clinical care. So these were the early days, but we've moved very, very quickly in the field in the last few years. And we've moved beyond hand-programmed feature recognition. And these images here, which is often termed the dog or food challenge, give you a good sense of why hand-programmed feature recognition is not adequate for the complexity of subtle images. These images of a blueberry muffin or a chihuahua or a golden doodle and fried chicken can acknowledge, as a human being it's not hard for you to tell the difference between those things, but you can imagine if you were just trying to program a computer to recognize the visual differences to say look for something that has three dark spots and is otherwise golden, you can imagine why it would be hard for a computer to follow you and get any sense of what the difference between these images were. So AI technology and computer vision can now easily distinguish these types of images, but in the old days of hand-programmed feature recognition this really wasn't possible. In our own endoscopy suites a lot of the polyps that we really care about now are also ones that really would not do well for us to try to hand program a computer to recognize them because the subtlety is just incredible. As gastroenterologists you can with a little bit of effort probably identify most of the polyps in these images, but we can all understand why this is a challenge for algorithms. Just to give a sense of how far the field has developed, this is an example of a software that's freely available it's called YOLO, you only look once, that is able in real time on a laptop to take in any visual stream, this is a James Bond movie, and recognize the objects in it with no delay. One of the reasons that I show this is that I want to make the point again that this software is freely available and it runs off of a laptop. As we consider some of the cost considerations of AI as it develops over the years, we have to go back to the point of remembering that this software has advanced so far that a lot of the core components are free. And so most of the cost of AI that we may encounter in our clinical practice has mostly to do with the effort of collecting data, labeling data, and ultimately getting that into the system. It's not because the software or hardware is so futuristic that it should be wildly expensive. So you're going to hear over the course of today a few concepts that I just want to briefly introduce. And these are all key concepts of computer vision. You'll hear the term classification, which is essentially saying is there a polyp in the image or not. Localization provides a box around where the polyp is. Object detection generally refers to if there are multiple different objects to detect and where they are. And then segmentation refers to identifying the actual pixels where those objects are. And you can ask for a polyp, would that ever be useful? But I think we could acknowledge that for Barrett's dysplasia or IBD dysplasia, understanding the outline of the abnormality might actually be useful for us. And there will be further discussion of these terms later in the day. I also want to make the point that one of the key challenges in our world of endoscopy is not a shortage of ideas or energy or people, but it's the challenge of these data-hungry algorithms. Training systems is incredibly labor-intensive and requires huge amounts of data. A 15-minute endoscopic procedure, which is about 30 high-definition frames per second, has about 27,000 images. So labeling these images, which is still generally required for most AI algorithms, if you want to detect gastric polyps or Barrett's dysplasia, you name it, it ultimately comes down to a heck of a lot of images and physicians or somebody who knows what they're looking at hand-labeling or hand-categorizing data. So this is really one of the big sticking points in our field right now. And this is what some of these types of labelers can look like on the back end. There are a lot of different inputs, but what this really means on the other side of the screen is somebody sitting on a late Wednesday afternoon or a Saturday evening in their room, maybe with the movie playing in the background and just drawing outlines of the pathology that they see frame by frame by frame. There have been some attempts to make this a little bit quicker. This is an example of a labeling algorithm that allows the gastroenterologist just to label only 2% of frames and then it sort of interpolates all the other labels in between. So there are a lot of steps to potentially speed up the labeling process. But nonetheless, this is still a very labor-intensive thing. This particular algorithm, which is made by a UK startup called Chord, what we're seeing right here with just a few minutes of effort will actually lead to over a thousand still images labeled because one out of every 50 or so images are labeled. And now the entire video has been ultimately labeled accurately with a little less effort. So this sort of concept of making labeling easier is not a particularly sexy aspect of AI, but it's actually I think a really, really important aspect of accelerating development in our field. I want to switch gears for a moment just to introduce a couple of other concepts that we'll explore today. The first is natural language processing. These are algorithms which can transform the natural language in a chart, unstructured data, a dictated clinic note or colonoscopy and path reports from different areas so that it can be used for computation or analysis. A good example would be automating ADR. What we'd really like to see with natural language processing is that it should enable faster and more powerful analysis of our own practice data. So I would love to see a point, and I think we will see the point, where we can walk into our office and can say, Alexa or Google, show me the ADR trend for my first time screening colonoscopies during the last 12 months. That is not an incredibly complex data query. Your assistant, your medical student, can do it by hand. It will take a long time, but it's not a complex task. That's really something that computers should be able to own for us and be able to do immediately. Similarly, we should be able to ask our administrative assistant to generate a list of patients in our practice who are overdue for their follow-up colonoscopy. That should be a single-click action. Right now, we all know it's far from a single-click. Or what is my ERCP cannulation rate for native papillas? Again, ultimately, these should not be complex queries for us, and this should be data that should be at our fingertips. There have been some attempts to make this happen over the course of the last few years. This is one paper that was published in GIE by Dr. Raju, looking at extracting data from a database to try to identify whether a colonoscopy was a screening colonoscopy or a screening colonoscopy. Trying to identify whether something was an adenoma or not. Ultimately, getting a fairly high accuracy, just having a computer run through the data without doing that by hand. A bigger step has occurred also in GIE just in 2020, which was the use of what's called optical character recognition with natural language processing. The big advantage here is that optical character recognition allows text recognition, even from scanned or PDF reports. Instead of getting structured data in a back-end Excel sheet or a SQL database, this can scan text and incorporate that. This was a really nice study because it was on a messy set of data. It was from a health system that used EPIC, PROVATION, POWERPATH, and I think one other system. Some of those reports were PDF, some weren't structured data, and it was the messy reality that we all typically deal with. This was done in a way that allowed us to and this was able to figure out polyp detection rate, ADR, inadequate bowel prep, failed SQL intubation rates with an accuracy that was very similar to a manual physician review of the data. You can imagine how powerful this would be for all of our practices if this occurred automatically. I think one of the most powerful statements in this paper was that they stated that the natural language processing algorithm could take under 30 minutes to extract all of the data on all colonoscopy procedures ever done at their institution since the introduction of EHRs, compared to manual data collection by both authors, which typically took about six to eight minutes per patient. This is 160 man-hours for annotating data from fewer than 600 patients. The power of natural language processing, I think, is actually going to be really relevant to our futures in gastroenterology. I'll finish off with a couple of quick concepts of what can go wrong. One really important concept is that our intelligence is robust, but AI is quite fragile. As an example, if I'm a radiologist and I normally read CAT scans here at Beth Israel in Boston, and I walk across the street to Children's Hospital, in general, I'll be able to read CAT scans with equal efficacy. An AI algorithm that was trained on CAT scans at Beth Israel, the particular scanner, the particular software here, might be brought over to Children's Hospital across the street, and it might not work at all. Literally, it may not work at all, because the flexibility of AI is not nearly there compared to humans. Slight perturbations in what the data looks like, slight perturbations in the camera, the processor, how the data is handled, can render AI totally useless. It's something we have to pay attention to in gastroenterology as well, because it may be that certain algorithms might work well on a Fuji processor, but not on an Olympus processor. We have to pay attention to these types of things. AI cannot tell when it's wrong. This is an example of a CAT image. You and I can all tell it's a CAT image, but this CAT image has been manipulated slightly by researchers at MIT to fool an AI system into thinking that it's guacamole. The AI system was 99.8% sure that this is guacamole. Certainty, as demonstrated by an AI system, is not the same as human certainty. There are lots of issues here that we're going to have to continue to be aware of. One other final subtlety I want to share with you is that when we train AI systems, there have been a bunch of competitions out there for training chest x-ray systems or polyp detection systems. One of the challenges is the winners of those systems, the winners of those competitions, the system that best captured the chest x-ray nodules in 100% of the chest x-rays in that data set, often are not the same systems that are going to work the best in real life. There's this concept of overfitting, where a model is so tightly trained and works so perfectly on that particular data set, that in real life, it's not that actually generalizable. This concept of overfitting, really, really good performance on one data set, but not great performance in real life, is something that we'll also have to pay attention to. Models that generalize better, making some mistakes, often perform better in the real world. I'll just finish here and say that my best advice for folks who are interested in this area is that we have to develop collaborations with experts in the AI community. That means software engineers and developers and AI experts. I think gastroenterologists are all going to become partial experts in AI, but it's not going to replace collaboration. There's a gigantic growth opportunity. Right now, this type of course that we have today represents this tiny little bit of overlap between the GI community and the AI research community. Our goal, I think, as a society, ASGE, and as gastroenterologists and as clinicians, needs to be to grow that overlap over time. I think this course represents a really great opportunity for that. I will end there and hand back to Pratik for the next lecture.
Video Summary
In this video, Tyler Stevens introduces the basic concepts of artificial intelligence (AI) in endoscopy. He starts by defining AI as computer systems and algorithms that can perform tasks requiring human intelligence, such as visual perception or speech recognition. He emphasizes that AI in the medical field is mostly task-specific, and the concept of generalized AI, mimicking human intelligence fully, is still far away. Stevens explains the overlapping terminology in AI, including subsets like machine learning and deep learning. He discusses computer vision as an application of AI that allows computers to interpret the visual environment. He also introduces natural language processing, which transforms unstructured medical data into structured data for analysis. Stevens highlights the challenges of training AI algorithms, including the need for vast amounts of labeled data. He explains the concept of overfitting, where models perform well on a specific dataset but may not generalize to real-life situations. He concludes by emphasizing the need for collaborations between gastroenterologists and AI experts to advance the field.
Asset Subtitle
Tyler M. Berzin, MD, FASGE
Keywords
artificial intelligence
endoscopy
computer vision
natural language processing
machine learning
collaborations
×
Please select your language
1
English