false
Catalog
Gastroenterology & Artificial Intelligence: 3rd An ...
Panel Discussion and Q&A - Session 4
Panel Discussion and Q&A - Session 4
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
So, very nice lectures, Dr. Khan and Dr. Hassan, so both of you mentioned the importance of training a data set of the ground truth. Can you, Charles, can you make additional comment, there will be many differences from radiology to GI or we have to follow the same path? I think, I think you'll find it's, it's pretty similar, you know, the, we're all dealing with medical imaging problems. And one of the things that we've seen repeatedly, and it's an old adage from computer science is garbage in, garbage out. It really makes a difference to have well annotated and carefully curated data sets to train AI systems. It can, it can add an order of magnitude in terms of improvement. We know that you can get by with what are called noisy labels. People have done that in radiology, for example, using the radiology report, which, you know, may not provide discrete data about every potential feature in the imaging study. But having information that's been labeled and done carefully produces a really high quality data set. And that example that I showed you of the, the chest radiograph being labeled for, for pneumonia, that, that's a publicly available data set from NIH. That was the chest X-ray 14 data set that they published with some 110,000 chest radiographs. We took 30,000 of them and had a team of about 16 thoracic radiologists that participated in the annotation process for, for intracranial hemorrhage. We had, I think it was about 80 neuroradiologists, expert neuroradiologists who were involved in labeling those studies and it produces for, you know, when you do that, it really can help you generate a very high quality data set to use for competitions and also potentially to evaluate products that have been put out there and have something that, that you can use to evaluate those. Charles, I'm happy that you have 80 very experienced neuroradiology, because it's very difficult in endoscopy to have 80 very, very experienced endoscopy, but apart from this, Alessandro, coming back to your question, coming to endoscopy, I feel that the point by Doug before was very well taken. How can we be sure that this machine will detect the subtle lesion? The answer is easy. You need endoscopists in the training phase who were able to detect this subtle lesion. So what's the ADR of the endoscopy training, any data set, this is what we need to know. Secondly, Alessandro, in training for characterization, there is an issue, and this is pathology, an issue we usually don't talk. What about the scissor-raised lesion? How can we be confident that the system is right in detecting, in characterizing the scissor-raised lesion if we are not confident that pathologists can, in the first place, characterize this lesion? So in endoscopy, I think that we have a problem, especially in pathology, but for detection, we need to know who collected the case, what the ADR was, because in my view, this is the real issue. Thanks, Cesare. Sorry, before we move on with the discussion, let me introduce the two other panelists so they can help us in the discussion. We have with us Dr. Michael Riegler, he's a Chief Research Scientist at Simulanet. His research interests have been medical data analysis and understanding. He's involved in several initiatives, like Medieval Benchmark Initiative for Multimedia Evaluation. And Michael is also an expert member of the Norwegian Board of Technology on the topic of Artificial Intelligence and Healthcare. So thanks, Michael, for joining. Also, we have the pleasure to have Dr. Shani Haugen. She's Assistant Director for the Gas Entomology and Endoscopy Devices team at the FDA Center for Devices and Radiological Health. She holds bachelor's and doctoral degrees in microbiology from the University of Illinois and from the University of Wisconsin. She joined the FDA in 2009 and focuses on the radio gas intestinal scopes. As the Assistant Director, she leads a team of FDA reviewers to oversee the safety and effectiveness of a number of different gas entomology medical devices. So also, thanks, Dr. Haugen, to join us. Shavanthi. Oh, yeah, I was just going to introduce the speakers for this, so thank you. Welcome, Michael and Shani. That's very nice to have you on board. So Michael, a question for you, because you've been involved in collection of data and putting out an open data set in endoscopy, the quasar hyperquasar, can you tell us your experience with that? What were the logistics and what were the setbacks? Because you routinely put out challenges using these data sets. Yeah, so it's challenging to do that. If you go that step, you open up for exposing mistakes. And we made mistakes at the beginning. So, for example, for the quasar data set, we have several versions because we, for example, had wrong annotations and so on, which got fixed over time. And there the challenges, for example, were really helpful because there, if you put out the data set into a big challenge and people start working on the data, there will be mistakes coming up and there will also be suggestions made on how you can make this data better. And that's what kind of how we grow our data set over the years and also identified new problems and interesting things to look at. For example, I think it's really important to understand what are the things that the AI system actually cannot detect, because that will be the important stuff we have to add to the data in the future, because that's clearly missing in data. So you have to make your data set really diverse and it has to grow over time. So I think that the data set is not something static that we put out once, but it grows. Of course, you have to make different versions so that people can say, OK, I work now on this version to compare the performance and so on. But it has to be something that improves over time and where a lot of people in the community actually work together to make it better. And the challenges are a really good part to help in this process because there you really put yourself out and people test and so on. So I think it's kind of a challenging experience, but it's important to do. Yeah. Thanks, Michael. And Shani, I have a question for you from an FDA standpoint. It's always nice to have the FDA presence here. So from an algorithm standpoint, I know there are several clinical parameters, but does the FDA also include parameters that include explainability so that we as clinicians first understand how these algorithms work? And two, it also helps us explain to the patients as to why these algorithms work and why we want to use them. Yeah, that's a great question. So when it comes to artificial intelligence, I think we heard about how there's somewhat of a black box. And so what matters most is the type of data that comes out. And so what we heard about standalone performance testing as well as clinical testing is what FDA is looking at. And you raise a really great point. Where we would like to go is really standardizing terminology, standardizing the type of information that is presented to physicians in the labeling. And so that's really where our focus has been. So providing information not only on the clinical studies that were done as well as the endpoints that were met in terms of safety and effectiveness, but also the standalone performance testing. So the data that really benchmarks how well the algorithm performs. We think that will allow physicians to be able to convey how effective a device is and also allow for comparison amongst different devices. I have an additional question from an FDA standpoint. Do you believe that in the future, rather than having a prospective randomized clinical trial, we can use studies based on video analysis? So I'm not sure where we'll go. But for now, for any new device, we would expect that there would be clinical data. If there's a modification to a device for which we've already cleared, then running the algorithm through the standalone performance data set that accompanied the clinical data is certainly something that we expect to see. We don't think that a brand new clinical study would need to be conducted for modifications to an algorithm or to a device. But probably for any new device, we would expect there to be a clinical study. Can I say something about this? Because even in Europe, we have a very nice white document by the European Commission. Everything about AI must be accountable, must be transparent, and blah, blah, blah. But were endoscopies accountable? Does any patient know the ADR of the endoscopies, its ability in resect and discard? is that we are so variable in our endoscopic performance that not to use AI only based only on standalone performance is unethical. So we cannot ask AI to be accountable if we are not accountable in the first place. The accountability of human mind make the need of AI, not vice versa. So first we need to give to AI what we give to our mind. That's my point. Thanks, Chesire. Charles, Dr. Khan, I actually have a question for you. So we all know radiology is kind of leading the way in this space. So what challenges have you had in terms of reimbursement from getting CPT codes in for AI-assisted radiology? So that's a great question. For the most part, when people are looking to purchase and use an AI system, it's really going to be something that the physician is going to be buying as something, say, to improve my productivity. Let's say it helps me find lung nodules more rapidly or to do measurements more efficiently. There are relatively few instances where AI itself gets payments. One of them is computer-aided detection, CAD in mammography, where there actually is an additional payment code that Medicare provides to provide payment for the specific use of AI, computer-aided detection. There is another tool, and I think there's a second one that's received. It's something called the NTAP, the new technology add-on payment that's provided, again, by CMS. It's meant to provide, I guess, somewhat of an incentive for people to incorporate these technologies in the interim before the RUC goes and reassesses the various relative value codes in order to reassess the value of various imaging procedures. Going forward, I think most of us look at this that it's really not going to be something that is going to be a source of additional revenue. It's going to be a source of additional productivity. If this allows me to perform exams more rapidly, more comprehensively, more safely, we're looking at tools that do reconstruction of CT that allow you to reduce the radiation dose. We have one that we use for doing multiple sclerosis patients for neuro exams that lets us avoid using IV contrast on 87% of patients. This way, it gets the patients through the scanner faster. It reduces the cost of the procedure. It makes it easier for the radiologist to read. It's cheaper for the insurance company. All of these things, that's sort of a sweet spot of where we want to go with much of this. Okay. I have a question for Michael. Some of the AI systems available for endoscopy have been trained with a specific dataset. Some of them, they come from China, some from Europe, some from the United States. You think this is a major barrier? We should have a more global because, for example, Japanese, they do see more flat lesions as compared to Western. How we can get these all together? I think my dream would be if we could all together throw our dataset in a big kind of container and share it with each other. Then we could build real useful systems. Everyone is making their own systems. I don't think that any of them works perfect. There's still a lot of work to do. If you just look at false positives rate of systems that are tested in the clinics, it's horrible to use. I wouldn't want to be a colonoscopist and use any of these systems. I think really that we have to somehow manage to overcome this barrier of seeing data as our gold or something like that, but really put it together, share it, and make a huge database like ImageNet. I guess most of you heard about that. Where we really have a lot of diverse data from different countries, everyone can use it. In the optimal case, we would have some secret test sets that we can use to verify whatever publication or system someone wants to put into the clinic. We have to come up with some way of doing that. I don't know if it's possible, but it would be really great if we can do that. One more thing I wanted to add to it because we talked a bit about or there was a question about explainable AI. I think explainable AI or at least the methods that they used in this new emerging field, they should become standard to be used because then we can avoid mistakes like this paper we saw where the L was detected. If you would use some simple methods from explainable AI like Red Camp, you could see that immediately and you would throw that algorithm away. I really hope that FDA would use that for assessing algorithms they get because then they can clean out these bad algorithms really quick. Just, Michael, if I might to that, although there's actually some work that's coming along right now that shows that although we think that these saliency maps, heat maps are sort of an objective way to understand what's going on inside the mind of the AI system, it turns out that there is a lot of sensitivity to initial conditions. There's a lot of variability even in how one uses it. Even if one uses it, you have to be, again, just really careful about how you do the testing of the systems. FDA, to their credit, we just heard an address by Dr. Wunderhagen's colleagues who deals with the radiology side of the house. They're very careful and rigorous and still trying to work to get appropriate tools into the market, but it's a challenging process and a difficult area to regulate. Yeah, that's a good point. I don't think these methods can actually really explain something to a physician, but it's also important for understanding the failures of your system. As I said before, I think for me it's even more interesting to understand where did your system not fail and what did it actually achieve in terms of sensitivity and precision. So I think we have a question up here asking about taking into consideration all that has been said in the session so far. Do you think an AI system used for mass screening tests, such as for polyp detection in our use case, can really be accurately evaluated in pre-marketing studies? I think it's a question for all panelists. Can I start? Because with Alessandro we trained a two-system and you must watch the result we had in the stand-alone performance studies were virtually analogous to the result we had in real-life endoscopy. I still remember the negative predictive value of one system for characterization was exactly the same when it was tested in stand-alone performance setting and in real-life endoscopy. You know why? Because stand-alone performance setting is the worst possible case scenario for artificial intelligence because we give to AI polyp in a blurred image, not a six o'clock taken randomly from an endoscopy video. On the other hand, when you test AI in real-life practice, you put the polyp in the right position with the right focus, the right magnification, with the right light. So I get that stand-alone performance study represent the worst case, not the best case scenario for an AI system in endoscopy. Shani, you want to take the next take on that? Sure. So there's always going to be some artificiality in any clinical study. I think for us what's most important is that the labeling is very transparent on patient demographics as well as endoscopist demographics. We would prefer for any endoscopist involved in the study to be representative of a community gastroenterologist so that we can have a sense of what the actual improvement might be. We certainly want reporting on demographics of patients used for training as well as demographics of patients used in the clinical study. So we do think that as best as can be controlled, that the results of such a study would accurately reflect those patients under the conditions of the study. That's a great point. Thank you. So what you're saying is it's not just the patient and the diversity of the patients, but it's also the performing people like the endoscopist need to be diverse as well. Absolutely. Yeah. Charles, anything that you want to add? No, thanks. I would agree with the point. It's the same that we've seen in diagnostic mammography. I mean, all of these systems are built as an aid to the interpreting physician. And it's very much the same, I think, as what you experience in endoscopy in that sense. It's the thing lights something up as a false positive. It's at best a, I guess it's a, well, it's a distraction if nothing else, right? That potentially slows you down. Maybe it gives you confidence that you're not missing things, but it's important when evaluating. And I think Shani's point is a very good one that you're not only have a diversity of subjects, but of the physicians who are actually performing that and using the tool. I think we are at hand. It's been all great points, great debate. Sravanti, thanks for joining us. And we have a five minutes break and we will come back to all of you with the final part of the meeting. Thanks all of you guys.
Video Summary
In this video discussion, Dr. Khan and Dr. Hassan emphasize the importance of training AI systems with well-annotated and carefully curated datasets. They highlight that having high-quality datasets can significantly improve the performance of AI systems in medical imaging. Dr. Khan also mentions a publicly available dataset from the NIH called the chest X-ray 14 dataset, which was annotated by a team of thoracic and neuroradiologists. They discuss the challenges of training AI systems in endoscopy, particularly in terms of having a diverse and representative dataset. Dr. Michael Riegler, Chief Research Scientist at Simulanet, shares his experience with creating open datasets for endoscopy and the importance of sharing and collaborating globally to build useful AI systems. Dr. Shani Haugen, Assistant Director for the Gas Entomology and Endoscopy Devices team at the FDA, discusses the need for standardizing terminology and information presented to physicians in the labeling of AI algorithms. She also mentions the importance of explaining how AI algorithms work to clinicians and patients. Overall, the discussion highlights the challenges and considerations in training and evaluating AI systems in the field of medical imaging and endoscopy.
Keywords
training AI systems
high-quality datasets
medical imaging
endoscopy
open datasets
standardizing terminology
×
Please select your language
1
English