false
Catalog
ASGE Annual Postgraduate Course: Clinical Challeng ...
Learning from Other Disciplines: How Far Behind is ...
Learning from Other Disciplines: How Far Behind is GI?
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Let's move on to Dr. Charles Kahn. Dr. Kahn is professor and vice chair of radiology at the University of Pennsylvania, specializing in abdominal imaging and holds degrees in mathematics and computer sciences. His professional interests include health services research, decision support, artificial intelligence, information standards, and knowledge representation. He's authored more than 140 scientific articles and given more than 130 invited lectures. He's the editor of the journal Radiology Artificial Intelligence. Welcome Charles. Thank you so much. Good morning, everybody. I'd like to thank Prateek and Mike and the society for welcoming me here today. You'll notice the title of my talk, learning from other specialties, the part about how far behind is GI. Part of that is some small sense of humility. The other part is a strong sense of self-preservation, being about the only radiologist in the room. I wasn't going to try that. So I have no commercial disclosures. As mentioned, I serve as a journal editor and get some compensation. I would say that all of us actually, and it's really been fascinating for me to sit and listen to the discussions here this morning. We really all are addressing many of the, really the same problems and the same challenges. And I'll just share with you some of the things that I've seen and have dealt with. As you know, and there have been a considerable number of products that have come through and been developed and received approval from FDA. The great majority of those in fact, have been in radiology. And yet we really are just beginning to roll out many of those. It's been challenging. I will tell you, even at a place that is heavily research oriented, we have three different labs within our department that are doing AI research. We're just rolling out some AI products into our clinical operations. We have a number of things that are homegrown. I can maybe tell you a little bit about some of those just because of the impact that they have. But it points up some of the challenges that we all face. In the early days of this, it was really the case that companies would come to radiology departments for a partnership, but in fact, really looking for a source of data because they have the computer scientists, they have the AI models, and really just need lots and lots of data on which to train these things. This first slide, this is actually an image from the New York Times, and I suspect many of you may actually know this. They're actually, the caption there says, AI is learning from humans, many humans. In fact, this was optical colonoscopy, that these are individuals who are training the system. But one of the things that's really important for all of us is to recognize who established the ground truth, who trained these AI systems? What were their qualifications? Did people, when the training was done, did they look at inter and intra rater variability in terms of designing these systems? Remember that FDA qualification of a product and the fact that you then are able to buy that product and put it into your workplace, that's really just the first step in terms of what you want to be thinking about with some of these things. One of the, I actually got invited to participate in a meeting that was held by the National Eye Institute, so hanging with ophthalmologists, and one of the issues that they have kind of goes back to the way radiology looked a couple of decades ago. And this is how we were. We had CT scanners, you might have a couple different ones, and you printed the images onto film, and you could put the sheets of film in the same size jacket, and that's what we called interoperability. But it really wasn't at that point. One of the things that has been key for radiology and for the advancement of many of the digital technologies in radiology has been the application of information standards. You all, I know, are familiar with DICOM. It's part of what we do, but it's the standard for capturing medical imaging information, and it is in hundreds of thousands of devices worldwide. There are literally billions and billions of images that are stored in DICOM format, but it's actually really an important part of what has enabled radiology to build interoperability and to have a foundation on which to innovate. And ophthalmology is really, they are challenged in this because the information that they get in a variety of things, whether from retinal fundoscopy images or optical coherence tomography images, it all is, they just get JPEGs. They really don't have standards that wrap around a lot of the information that they get. So I wanted to impress upon you the importance of information standards in what we do. This image, to me, sums up a lot of the challenges that we have. So here was a system that was built to detect pneumonia, and they trained it at two different institutions, one primarily inpatient that had about 30% of the patients with pneumonia. The other one was primarily an outpatient facility. About 1% of the patients had the pneumonia, and the system functioned exceedingly well. They used this technique, which is called a saliency map or heat map, to basically ask the question, what is it that allows you to tell whether the patient has pneumonia? And the computer said, it's the letter L. Because as a little, maybe doesn't quite show, but in the image on the left, the letter L is in kind of what you think is normal orientation on a black circle, and it's reversed at the other one. Basically it was using the radiographic marker to make a determination. We laugh at things like this, except we've seen it over and over. There was a system that was built to detect active tuberculosis. Turned out this thing was seeing the words TB clinic on the bottom of some of the films. Systems for detecting pneumothorax. Turned out they weren't seeing the pneumothorax, they were seeing the chest tube that had been placed in patients with pneumothorax. We laugh about these, we despair about them, but the fact is, until you test these things rigorously, until you ask the right questions of these systems, you don't see these things. And these can work to your detriment. One of the things that we've learned and we've seen the hard way is the problem with external validation. It is almost a guarantee that a system that is built in one instance will function less well in yours. And this was a real nice paper that came out of a group at Hopkins, John Ng and his team, looking at external validation of deep learning algorithms for diagnosis. And they looked at quite a wide variety of studies, but basically found that, as you see here, the vast majority demonstrated decreased performance and some showing a substantially decreased performance. That vertical line there, they look at 86 algorithms, the vertical line is the nominal performance, the reported performance of the algorithm. A couple things did a little bit better on the external data set. External validation being testing on data that is different from what you trained on. But a lot of the systems really tanked. And it's tremendously important in what you do that you test the system on your population. I can only tell you as just a simple anecdote, I was at a meeting kind of like this and one of the company reps said, you know, come take a look at our system that detects chest lesions, nodules on CT. And they showed me the images, they were at lung window, it just, the nodule looked a bit dense. I said, can you do that at soft tissue window? It was a calcified granuloma. If you're, I'm from Chicago originally, if you've lived in the Mississippi or Ohio River Valley, you know about granulomatous disease and histoplasmosis, not so much seen on the East Coast. But a lot of these things really depend on the population on which you test and you have to keep testing. If I train a system that works on my CT scanner, if I get a new scanner, if I update the reconstruction kernel, if I change some of the software, it's quite likely going to have problems. And the other one is just, we have to be really careful when we look at how to measure the performance of an AI system, the metric by which we judge who's the winner. So one of the things, you know, a very common task, and as was mentioned about radiomics, segmentation actually is really important. And the quality of the segmentation will affect the output of performance of radiomics tools. So here, let's say, we're looking at a CT of the liver, and you have a patient that has three metastases in there, and that's the reference image on the left. You have one system that does a really great job, that number one there, that finds the big lesion, but misses the two satellites, and it gets a score, this is called the Dice Similarity Coefficient, which is the most commonly used one for segmentation challenges and activities. It gets a Dice score like 0.92. The second one, as you see, does a pretty good job also of finding the big lesion, but it picks up those two satellites. Now, it gets a much lower Dice score. So what's the winner? Prediction one. But the thing is, clinically, you really want to know about those small metastatic deposits. So making sure that the mechanism by which you're measuring the quality of the tool that you're using is really critical. We talk about concerns of bias, and these things, like you saw the letter L, these so-called shortcut learning, where the system learns associations, AI tools are phenomenal at figuring out associations. They will get to the answer, and they'll do very well at it. But this was some work done by Judy Gachoya at Emory, and they had patients self-identify as Black, White, or Asian, and they had a chest radiograph. And this is their archive preprint. There's actually, the full paper is out now in Lancet Digital Health. But basically, it identifies the patient's self-identified race from a chest radiograph. We like to think in radiology that we don't see color, right? We just see shades of gray. The fact is, radiology is seeing race, ethnicity, in these images. And what's disturbing about this is we know that there are algorithms. There's the paper by Ziad Obermeier that identified racial bias in a widely used system that would assign healthcare resources, basically decide how much healthcare spending a patient should get based on what it knew about them. And it was shown to be racially biased, and part of that was historical prejudice. And so the danger is that a system, again, unless you look for it, can have underlying information that it's capturing, that it's seeing from those images that you're not even aware of. And it could be communicating that, and other systems downstream could be using that information as well. This is a checklist called CLAIM, Checklist for Artificial Intelligence in Medical Imaging, that John Mongan, who's at UCSF, Linda Moy, who's now the incoming editor of the journal Radiology, which is our main journal, and myself. And one of the things I'm actually proud of in this guideline is it never uses the word radiology. We talk about medical imaging, and in fact, it's been adopted by some journals in ophthalmology and veterinary sciences and some other areas that I know of anyway, and we see a number of papers on it, on its use. But I encourage you to take a look at it, and maybe there are some things that need to be changed or modified to address some particular things that relate to endoscopy. Obviously we didn't, that wasn't primary in our thinking about it. But in fact, it provides a checklist that addresses the particular problems of diagnosis in using medical images and some of the things that you need to know in order to assess the quality of a study that's been reported. The other one I'm just going to point out to you is kind of a useful guideline, which you probably have not seen. This was published in European Radiology, got to love the name, the ECLAIR guideline, but basically to evaluate commercial AI solutions for radiology, and Omumi, Patrick Omumi is the primary author on this one. And these are the key questions when you're looking at an AI product to think about. What problem is the application intended to solve and who is it designed for? What are the potential benefits and risks and to whom? Has the algorithm been evaluated and validated, tested? How can it be integrated into your clinical workflow? What do you need to have in terms of IT infrastructure to make it work? Does the application meet the protection requirements that you have in your institution and in the environment in which you work? I think number seven is the trickiest, has return on investment been performed? What's maintenance like? What's user training and follow-up like? And what are the potential malfunctions or erroneous results? I will tell you, as a journal editor, the very first paper that we published in Radiology AI was a system that detected risk fractures. And the thing that, in fact, to this day that I love most about that paper is they actually showed a gallery of false positives and false negatives and included that in their paper. Now, on the one hand, I find that refreshing as a sign of humility and of scientific truth, but it's also very useful because in the same way that your example of showing, you know, if I know that one of my trainees misses a certain kind of lesion all the time or is likely to, I can change my reading to kind of focus my attention on those things. And I think particularly with AI systems, having a solid understanding of how they work, it's great. I love it when people report an area under the curve of 0.95, that's swell. But show me where the system fails, because there's always going to be some room for improvement. And as a clinician, I need to know where that system could potentially go wrong so that right now, since I'm the one whose neck is on the line, I can have input into it. So with that, it remains to thank you all very much for your attention. It's been a delight, actually, hearing you guys are faced with many of the same challenges that we are in radiology, and I think we're all in this together. Thanks again.
Video Summary
In this video, Dr. Charles Kahn discusses the challenges and considerations of applying artificial intelligence (AI) in medical imaging, with a focus on radiology. He highlights the importance of testing and validating AI systems on specific populations, as well as understanding the potential biases and limitations of these systems. Dr. Kahn also emphasizes the need for information standards in medical imaging, using the example of radiology's implementation of DICOM. He presents cases where AI systems have displayed shortcut learning, such as mistaking radiographic markers for pneumonia detection or chest tubes for pneumothorax detection. Dr. Kahn advises clinicians to thoroughly evaluate AI solutions, considering factors like intended users, potential benefits and risks, integration into clinical workflow, and maintenance. He also urges researchers to report false positives and false negatives to enhance the understanding and improvement of AI systems. The speech ends with Dr. Kahn expressing solidarity with other specialties facing similar challenges and thanking the audience for their attention. No credits were mentioned in the video.
Asset Subtitle
Charles Kahn, Jr., MD, MS
Keywords
artificial intelligence
medical imaging
radiology
testing and validating
biases and limitations
×
Please select your language
1
English