false
Catalog
Gastroenterology and Artificial Intelligence: 2nd ...
Computer Vision: A Primer for Gastroenterologists
Computer Vision: A Primer for Gastroenterologists
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I think we're about ready to begin our second session. If I can have Dr. Rex and Dr. Repicci are on the line. So beginning our session two, the current state of computer vision in GI and endoscopy. This will include some video sessions as well. It's my pleasure to introduce our session moderator, Professor Alessandro Repicci, Director of the Digestive Endoscopy Unit at Humanitas Research Hospital in Milan, Italy, and Professor Doug Rex, a distinguished professor of medicine and director of endoscopy at Indiana University. I'll turn this over now to Professor Rex, if he's on the line. I am, Mike, and thanks to you and Pratik for putting this great course together and for inviting me to participate. And I want to say hi to Alessandro, who I see there. He's probably at home in Milano getting ready for his evening, so good to see you, Al. Our first speaker is Tyler Berzin. He's co-director of GI endoscopy and director of the Advanced Endoscopy Fellowship at Beth Deaconess Medical Center and assistant professor of medicine at Harvard Medical School. Dr. Berzin has authored over 100 articles and chapters on topics including EUS and ERCP, quality and safety in GI, endoscopic anesthesia, and applications of AI in endoscopy. So welcome, Tyler. Good morning, everyone. It's a pleasure to join you today. I'd like to thank again Dr. Sharma and Dr. Wallace for organizing this meeting, as well as all of our collaborators and staff across ASGE who have put so much work into making this happen. We'd also like to acknowledge our industry sponsors, many of the innovations that you'll see presented today. And I'd also like to take just a moment to acknowledge the GI fellows and trainees who are participating in this conference. This conference really is a glimpse of the future that we are hoping to prepare you for. And many of the ideas and concepts are areas which I think are huge opportunities for continued work and research in the future. So we're really excited that so many of you are interested in this area with us today. These are my disclosures. So we've already discussed quite a lot of computer vision and AI terminology today. And my task is to hone in really on computer vision. So computer vision is technology that allows computers to see and interpret visual content, photos and videos. And as you can see from this Venn diagram, a lot of computer vision can be accomplished through various AI techniques, whether machine learning or deep learning. But in fact, computer vision can occur outside of artificial intelligence as well. And you can develop software that can recognize things without the need for AI at all. So these are not fully overlapping concepts. We only need to go back to the history of gastroenterology. 2003 is not so long ago, but it was really the early days of using computer vision for endoscopy. And work was done even then to design computer vision programs that could classify or recognize polyps. But this was using hands programs feature recognition. So a computer program would basically design a program and describe in that program what a polyp might look like so that a computer could learn. And when a computer couldn't learn on its own or from data, it had to essentially hard program your concept of what a polyp should look like. And there were a lot of limitations to this. A big limitation is that the algorithm and processor on which this ran was not nearly fast enough for live endoscopy or clinical care. But nonetheless, this is the sort of early work upon which later work that we're doing now is based. So it's really important to acknowledge this early history. Now of course, huge leaps have been made over the course of the last few years in computer vision owing to big advances in artificial intelligence and deep learning. And our recent published literature has really been rich in innovative work from colonoscopy polyp detection to capsule endoscopy to work in upper GI, including Barrett's esophagus and upper GI cancers. The reason that this moment is really so exciting is that we're just right at this inflection point where very specific narrow task AI has been able to meet and exceed human performance, but again for very specific tasks. So polyp recognition is a good example of a specific task, so is chest, so is classifying skin cancer. We're not anywhere near the concept of a generalized AI that is often in the popular press. But what we're talking about here is narrow task specific AI. It's a really, really important moment in history for this type of work. So as a physician, I have a fairly simple view of AI. And the way that I think about classifying AI is based on really two questions. The first question is what does the AI accomplish? That might be computer vision, that might be speech recognition. And the second question is how does the AI accomplish it? It might be through machine learning or a specific deep learning algorithm. And that's really how I think about classifying things. Now another set of important terminology to understand is the concepts of classification, localization, and object detection. In other words, what can computer vision do for you? So on the left-hand image is a concept of classification, and computer vision could simply recognize that this is a picture of a cat. When you combine classification and localization, the computer can now tell you where that cat is, where's the information on that image that allowed the computer to say this is a cat. Once you introduce multiple objects, then the terminology switches over to this concept of object detection, where the computer is asked to classify and detect all the objects in the image and draw a bounding box around those. And then this last principle over all the way to the right is segmentation. And here the computer is tasked with something that is harder, which is to classify every single pixel in the image. And this effectively results in an outline of exactly where the cat, the duck, and the dog are. If we think about how these apply to our own fields, classification on the left, just showing that there's a polyp in the image, and there are some softwares that don't draw a bounding box, they'll just be sort of a notifier on, say, the corners of the screen that there's a polyp somewhere on the screen. So, classification and localization, we see where the polyp is. And then object detection, more complex when there are multiple things to find and identify. Moving over to segmentation, you could ask, why would it be helpful at all to outline a polyp so precisely? And maybe the answer is that it's not. But certainly for some fields, an example would be detection of Barrett's dysplasia or perhaps detection of a field of IBD dysplasia. Having segmentation and effectively an outline of the affected area may be very useful to us. So, I want to acknowledge that what we're asking computers to do here is actually really complex and something that humans do without a lot of effort. This is an example of a few images that can really challenge computer vision software, as we can see as human beings from afar that, sure, the dog and food, these food products look fairly similar, but it's not a lot of effort for us to get the classification right. But these are images that can really challenge computer vision software. The other concept I want to introduce is that image classification by a computer can be fallible and defining certainty is complex. Sometimes computers will provide an output at 100% certainty. In this case, this is an intentionally manipulated digital image of a cat. And we can't tell as human beings the digital manipulations, but somebody has worked to make this cat confusing for a computer. And their computer vision program has reported with 100% certainty that this is actually a picture of guacamole. And the reason I bring this up is just, again, to put this little asterisk in your mind about the concept of certainty. You'll see later in the talk and in other talks today, polyp detection or rather polyp diagnosis readouts that list certainty levels in terms of the type of polyp. And certainty for a computer and certainty for a human being are really different things. And I think there's some interesting questions to ask about whether as physicians we want algorithms to report their certainty on our clinical interface. The reason that the last few years has seen such dramatic advances in computer vision is that the underlying softwares have developed rapidly and in many cases have become freely available. So this is an example of a computer vision software called You Only Look Once, YOLO, which is freely available, in this case applied to a James Bond movie, but you can use similar algorithms essentially off the shelf to build polyp detection softwares. So the software and technology in many cases becomes widely available. If you use a self-driving feature on a car, computer vision is in your everyday life keeping you safe. This is the back end of what that self-driving computer vision looks like. And of course now we're introducing computer vision to endoscopy. You've all seen versions of this before already. This is actually a software that provides simultaneous detection and diagnosis. And again, I chose this image because it provides a percentage certainty as well with that diagnosis. And I think this brings up really interesting questions about what that certainty represents and how it might influence us if we see 70% certainty versus 80% versus 100%. Computer vision is also being used for quality assessment. This is the back end of a colonoscopy software that is detecting image clarity and prep quality and distention. And then this is the front end of the user interface that is presented to the physician where we see some ongoing detection and reporting of clarity and preparation and distention. This is obviously a very rudimentary user interface and one area I think of rapid innovation is going to be what the UI, what that sort of front-facing interaction looks like with a clinician when we use these types of programs. The role of computer vision for quality assessment I think is going to be a really interesting area. This is another example of an algorithm that is trained to monitor blind spots during endoscopy and also to label the appropriate images in the report. The concept of computer vision making our lives easier or more efficient is really appealing and something I think we need to pay a lot of attention to as physicians. So I'd like to spend the last third of our time here discussing what I think are really the three keys to the future for computer vision in GI endoscopy. The first is identifying priority use cases. The second is the data science and the third is how to grow collaborations among the key stakeholders in this area. So first in terms of identifying priority use cases, we've already seen that the field has essentially self-defined priority use cases which have included a focus on polyp detection and polyp diagnosis. The ASG AI task force recently published this position statement in the last few months which identified a number of other priority use cases including identifying early gastric cancer precursor lesions, detecting dysplasia and IBD and additionally included specifics regarding the key data elements and the primary outputs that the physician would need to see. And the approach that we used was really patterned off some preceding work by the American College of Radiology that has done really important work to lay the groundwork for thinking about use case prioritization in medicine. And the approach here really is to consider what types of questions have the highest clinical value for patients or maybe for the physician team or the healthcare team and then which of those things are solvable by artificial intelligence. And balancing those two key questions is really I think the best approach to creating use cases that are of value. The second key to the future for computer vision and GI endoscopy is really the data science. A 15-minute endoscopic procedure which is 30 high-definition frames per second yields about 27,000 still images if you take all the video together. Figuring out how to curate and label and store and share that level of data is really a significant challenge I think for our field. The labeling in particular is one of the least exciting, most laborious aspects of computer vision and in a lot of ways the most important. This is a particular labeling software where a physician can sit in front of a computer and label the outline of an organ, in this case the liver, so that a computer can begin to learn how to recognize a liver on a CAT scan. And one of the things that is uniquely challenging about medical computer vision is that labeling often requires a certain amount of medical training, maybe physician level, maybe subspecialty level. And so asking a physician to do this really laborious image labeling is I think one of the barriers in our field. There are quite a lot of companies that are thriving in this area. This is an example of a technology company based in China that employs hundreds of individuals who spend the entire day just labeling images. This is in fact not medical images but just general images and it just shows you the type of manpower and workforce that can be required to curate and label the huge wealth of visual data that is required to build some of these computer vision platforms. This is sort of a neat example of a new labeling software that has been created by a startup based in the UK that I saw fairly recently. And this software is intended to, in this case, help physicians label an endoscopic video. And the innovative aspect of this example software is that the physician only has to label one out of every 25 or one out of every 50 frames in the video and then the software can actually do the rest of the work and essentially interpolate the remaining video frames. And after a couple of seconds of processing, after the physician has labeled maybe five or eight or 10 video frames, now the software has captured everything else in between and you effectively have hundreds or thousands of video image frames labeled with relatively little physician work. And this type of labeling approach, I think, is pretty promising to allow us to make further leaps in our field. This is my simpler proposal for solving the image labeling challenge in gastroenterology. I think if every time we logged into Epic or Probation and we had to click on where we saw the colon polyps, I think we could make a short work of this clinical challenge. So finally, the third key to the future for computer vision and GI endoscopy is really doing, I think, exactly what we're doing now as part of this meeting with international representation, with physicians, with industry partners. This type of collaboration is really what's required to grow the field forward. I know that the ASG is really, really prioritizing this area over the course of the next few years. And equally hopeful is the data that we have suggesting that there's a huge amount of fellows and trainees in GI who are really interested in taking some of these challenges on. So I think the momentum built over the course of the last few years is exactly what's going to be required to make this type of vision part of our future and to really develop some innovative technologies that are going to benefit both patients and the health care team in getting this important work done.
Video Summary
In this video, Dr. Tyler Berzin discusses the current state of computer vision in gastrointestinal (GI) endoscopy. He highlights the advancements made in computer vision through artificial intelligence (AI) and deep learning. Dr. Berzin explains that computer vision allows computers to see and interpret visual content such as photos and videos. He discusses the different concepts related to computer vision, including classification, localization, object detection, and segmentation. <br /><br />Dr. Berzin emphasizes the importance of identifying priority use cases for computer vision in GI endoscopy, such as polyp detection, detecting dysplasia and other abnormalities, and quality assessment. He also discusses the challenges in data science, particularly in curating, labeling, storing, and sharing the large amount of visual data generated during endoscopic procedures.<br /><br />Dr. Berzin highlights the significance of collaboration among key stakeholders, including physicians, industry partners, and trainees, to advance the field of computer vision in GI endoscopy. He concludes by expressing optimism about the future and the potential benefits of using computer vision in healthcare.
Asset Subtitle
Tyler Berzin, MD, FASGE
Keywords
computer vision
gastrointestinal endoscopy
artificial intelligence
deep learning
priority use cases
×
Please select your language
1
English