false
Catalog
Improving Quality and Safety In Your Endoscopy Uni ...
Approaching Automation of Data Collection and Data ...
Approaching Automation of Data Collection and Data Analysis
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Fantastic. Joe, obviously, always appreciate hearing from you. I know this is an area of intense interest for you and something that the ASG really wants to prioritize going forward, so appreciate you giving us such a nice overview. Your talk really highlights the issue that many of us really struggle with is how do we really approach automation of data collection and data analysis? How do we actually do the work we say we want to do? You know, I have a pretty simple outline in this talk, right? What are the current limits of data collection, and then what options are there for collecting data, right? What can we do? So what are our limitations? Why do we have so many issues? And TR touched on this, and Joe touched on this. You know, why do we, you know, have so many issues in data collection? Well, one is that we really require a large volume of procedures to confidently measure procedural quality. And this really nice article came out almost a decade ago in GIE that showed, you know, we talk so much about ADR, and TR talked about it and how important it is, but you can't really calculate ADR unless you have enough screening colonoscopies to confidently get a good confidence interval around the calculated metric, right? So it's around 250 colonoscopies that are screening a year that you need to get a sort of narrow confidence interval to tell you what someone's ADR is. And a lot of colonoscopies, especially rural surgeons and rural gastroenterologists, don't do this many colonoscopies annually to actually give us a confident ADR. So that's already a big limitation of an outcome metric like ADR. We just don't do enough procedures to calculate the metric. It also can be cumbersome to calculate, and again, TR touched on this as well. And then sometimes they don't provide enough actionable feedback. If I tell you that your, you know, your detection rate of Barrett's esophagus is below your peers when you're doing EGDs for reflux, have I told you what you should do to get better? And so there are a lot of limitations of quality metrics that, you know, are challenges that we need to address as a field. And then it's really, you know, issue in therapeutic endoscopy procedures where the low volumes are just infeasible to really measure quality. I like the study that, you know, is really dating us back here, you know, now almost a couple of decades ago. I mean, you're not going to see wider confidence interval bars than you are going to see in this, right? So, you know, some of the endoscopies, and this is for, you know, diagnostic yield after EUS biopsy of the pancreas mass, right? And some of these confidence intervals literally go from 0% to 100%. That isn't useful information to give back to these endoscopies to say how they can improve their EUS quality. So the lower volume of some of these procedures really limits our metrics. And I touched on this earlier, but, you know, I think we just need to really be cautious about measuring outcomes. This is a study we published five years ago showing that on x-axis is how far you live from your hospital, the hospital where you got your procedure, and the y-axis is the adverse event rates. So what do I mean by this? So this suggests that when you live closer to the hospital, you're more likely to have an adverse event as measured by return to the hospital. When you live farther away, you're less likely to have adverse events as measured by return to the hospital. Obviously you don't have less adverse events because you live closer, sorry, farther from the hospital. It's that people who live farther away, they're lost to our system and we don't ever see their adverse events or their returns to the ED. And so we already know that there's a lot of data loss because of our fragmented healthcare system. You know, measures really do not confidently evaluate low volume providers. If you just sort of do this sort of thought exercise and try to measure native papilla cannulation rates, and Joe touched on what a critical measure this is, if I do 50 cases, it really is difficult to get a wide, a good confidence interval to figure out if someone's actually meeting the quality metrics of high quality ERCP. And so I'm going to talk a lot about measuring outcome metrics and how we can do that in an automated fashion, but there's a lot of imperfections in this in our field because even though we're much higher volume than surgeons in general, still doesn't help us when we try to get a confident measurement of quality. Furthermore, the reporting outcomes is controversial. This was a ProPublica report a few years ago, which tried to tell you, you know, if you want to find a gallbladder surgeon, who you should see based on their outcomes using Medicare data in terms of adverse events. And, you know, really, you know, it's going to be this transparency around healthcare quality and let you pick your surgeon, but it's the same issue again. When you pick these surgeons, their confidence intervals cross from lower than expected to higher than expected complication rates. Doesn't really help us, right? We understand, even though I'm going to give you some ways on how to measure outcome metrics, doesn't really help you enough to just focus on that. Furthermore, you know, we don't measure quality frequently, and sometimes we don't do it well, right? So, in this survey from Doug Rex and colleagues, you know, four-fifths of physicians were measuring ADR, but, you know, those who didn't, a lot said it was because they're too busy. That's great. Eighty percent is not bad to measure ADR, and we saw that in the beginning poll question as well, but when you ask people how they're measuring ADR, people didn't always know, you know, what it meant to measure ADR, right? A lot of people thought it was for all colonoscopies when it should be for screening colonoscopies, at least as of now. Some people, when you say, you know what, actually just one person, when you said, or three people, when you said, tell me about ADR, they could just say adenomyotection rate. They couldn't say anything else about it, what it means, how you measure it. They just knew what ADR stood for, and so, even though we, you know, this is our most fundamental quality metric in colonoscopy, in GI endoscopy, not everyone even knows how to measure it. So, measuring quality and practice is really difficult, not just for the reasons I told you about, but because even getting the metrics, if we decide it's valuable to measure the outcomes and safety of these procedures, despite the issues with low volume and confidence intervals, et cetera, it's really hard to abstract that data. And so, there's a few different ways we can do it. We can do manual chart review, which a lot of practices do, use a data registry, a data warehouse, natural language processing, and something that some of us on our panel today have been very interested in is just watching procedures, shifting a little bit away from outcomes metrics and into skill. So, this is manual chart review. This is a, you know, idea that if I'm going to measure quality, I'm just going to go into the EHR and my paper charts. This is, you know, dating ourselves to those of us who were there before EHRs were so widespread. I go into the paper charts and pull the outcome metrics. There's some advantages to this. It requires very little IT infrastructure. So, you know, if Joe wants to measure ADR in his unit, he can go to the EHR and just go through all the colonoscopies manually and measure ADR. But it's time consuming, and it requires a large number of calculations to ensure a confident prediction of ADR. And all that work you put in for that measurement helps you, you know, none for the next time you want to measure ADR, you have to do the entire process over again. So, when should you or can you use manual review? Well, I think it's to confirm a concerning automated review, and we do this a lot, right? So, if something's an outlier, like, wow, we're not doing a very good job at calculating this metric, I'll go into the EHR and say, are we, is our logic correct? You know, I'll do a manual review. So, it's very helpful as a spot check also for a concerning clinician review. And if something can't be calculated, that's when you have to use manual review. Sachin and I published this a few years ago, you know, showing that, you know, manual review and manual abstraction of data can be done. This was in people who finished ERCP and EUS training, and they just measured all these metrics that Joe talked about and said, you know, can we actually, what is our performance on, you know, diagnostic rate of adequate sample of USFNA or acute pancreatitis rates after USFNA or ERCP, all these metrics, and they showed that it's feasible to measure, and they very reassuringly showed that they're meeting the metrics that the ASG said are quality metrics for ERCP and EUS in their first year of independent practice. So, a manual review for something that's relatively easy, if you're just measuring one or two physicians, is a very reasonable thing to do. And this was for ERCP, again, showing that the trainees can meet these priority indicators. For example, they achieved, you know, a deep cannulation of the native papilla in 93% of cases after their GI fellowship was completed. And so, that's great, right? So, we learned that they were, we were actually training them well, and they're doing a great job. But you can't really easily pull that through an automated pool these days. What about data registries? So, GIQIP, as you know, is a very important partnership of the ASG and ACG, and Eden on our leadership today is a very important person for you to know if you want to know more about GIQIP, and she can give you more information about this. But this is a very important national registry that compares facilities and physician performance to peers. It provides immediate feedback with respect to benchmarking. It's compatible with essentially all endoscopy report writers that you'd be considering using, and it's available for CMS quality reporting, so, you know, the MIPS program. These are some measures that, you know, are traditionally tracked for colonoscopy and AGD. The ERSP-EUS measures are continually being refined and, you know, and something for you to sort of be aware of if you're interested specifically in the advanced endoscopy metrics with GIQIP. But this is a nice option if you have a small practice and you want to be able to compare yourself to other people around the country so you can figure out where you're going to be working. You know, it gives you this idea of showing your ADR and showing, you know, for both, you know, males, females, and then also showing you how it compares to the entire registry. So it's very nice for the benchmarking as well. There are EHR solutions for data registries as well. This is a slide borrowed from a small EHR provider called Epic that will be sort of rolling out these kind of dashboards for those people who are utilizing their EHR and their endoscopic report writing solution. And this is obviously very nice looking, right? It gives you all the metrics you could potentially be interested in in terms of GI endoscopy quality, specifically with colonoscopy, right? So atom detection rate, on-time starts, polyp detection rate, sequel withdrawal time, innervation rate, bioprep adequacy, and procedural volumes. First case, you know, on-time starts is obviously very important for people that beyond just the quality metrics of effectiveness, but of things like efficiency and things that we care about when we're, you know, GI leaders as well. And so these EHR solutions may reduce some of the barriers that, you know, TR and Joe talked about in terms of some of the metrics that we need to calculate. And again, this will be at an institutional level, but can also be benchmarked regionally. What about data warehouses? The data warehouses is the idea that you can integrate data from multiple resources, right? So our endoscopic report writing system, a pathology system, EHR into one location so that you can get metrics for reporting purposes. And so what does that look like in action, right? So this is at Northwestern, our data warehouse that we utilize to give feedback on colonoscopy quality, right? So obviously I've blanked out the endoscopist names in the left column, but I can see, for example, the first row, that endoscopist over a four-year time period did 3,311 colonoscopies, 1,230 of them were screening, and the ADR over that timeframe was 52.4% for that endoscopist. Pretty good. Their average withdrawal time was very long, 19 minutes. Insertion time was 5.4 minutes. They found serrated pulps in 25% of their procedures. So this is an obviously outlying extreme performer, right? But then you can also find people who maybe need to do a little bit better on this as well. We give this feedback periodically to help improve the quality of colonoscopy, but this is a data warehouse report that I can run anytime I want over any timeframe that gives us colonoscopy quality data. We can move beyond colonoscopy quality, and I want you to, you know, I say colonoscopy because it's sort of our gold standard for what we think about in quality and GI, but there's so much more. This is a dashboard we built using a data warehouse for upper GI bleed care, right? So this shows that, you know, what are we doing in terms of some of the upper GI bleed metrics that Joe touched on? Are we administering octreotide and antibiotics for our cirrhotic patients who have upper GI bleed? We found out we weren't doing a good job of that, you know, in terms of early administration, and that's now a new QI project for us. We are doing okay on some of these other things like, you know, not over-transfusing, you know, giving PPIs when we need to, but we can drill down by hospital as well as, you know, some of those things. So this is how we can drill down here. You can see that I can select specific hospitals. I want to focus just on my hospital at Northwestern Memorial and see how we're doing, and you can see we're doing pretty good about giving PPIs, not over-transfusing, and not doing so good about giving octreotide and antibiotics for these cirrhotic bleeds, and so these are really nice options if you have a, you know, good IT team at your hospital that you can really build these really dynamic dashboards to help you improve the quality of care, and so, you know, an option, but again, you need to have that sort of team I talked about in my first talk that's bought into this and lets us do this, and so we use this dashboard periodically to improve the care of our bleeds. What about moving into this world of AI? Obviously, very important, natural language processing is this idea as a form of AI that I think you need to know if you don't already. This allows computers to read, right? That's how I like to think of it, right? So we see this in our daily lives with things like spam filters, you know, and, you know, when you're getting emails, a lot of things go to your spam inbox, thankfully, and that's because computers can read your emails, and that's because computers can read it and realize that this is not a message that's really meant for you. We see this in our electronic assistants like Alexa and then Amazon reviews, et cetera, and that we see that for in docs too. A lot of people, including ourselves and our group has shown that you can use NLP to basically read procedure reports and then develop quality metrics, I'm sorry, abstract quality metrics from it, right? So like I showed you before, we use NLP to basically pull in pathology reports and basically calculate ADR, serrated section rates, et cetera. You can also do things like this where you can say how many polyps you find in the right side of the colon versus the left side of the colon, all sorts of work that you can do if you can actually read the procedure reports. It's really kind of like manual review, except the computer is doing it for you. So it's a very nice feature to lean on if your institution has expertise in it. And there are solutions of NLP and endowriter solutions. So this is an example of an endowriter that uses NLP to determine whether adenoma has been found and calculate ADR. This is through EHR solution probation, which basically takes all the information, uses NLP, and they can give back to you things like adenoma detection rate and other metric reports as well. So it's an integration of EHR solution and NLP. Now, what about watching? I talked a lot about pulling in quality metrics. I do want to just emphasize that there is some advantage to just looking at the skills of a procedure and not just the outcomes. We do professional practice evaluations. All of you do. There's something called an ongoing professional practice evaluation. This is a screening tool to evaluate all practitioners who have been granted privileges. For our institution, the OPPE consists of the adenoma detection rate, how many incident reports you have that are reported to the medical staff office, et cetera. But if someone seems like an outlier, then you need to do a, oh, this is our OPPE example, right? So this is an example for me where my adenoma detection rate was 47.5%. I don't do as much colonoscopy as my peers because I'm a therapeutic endoscopist, but my ADR was 47.5. My serrated detection rate was 15%. It gives me all my metrics over that timeframe, compares me to my peers. So this is our OPPE at Northwestern, and each physician gets this. But if I have a metric that isn't so good, if I have someone who's not doing so good, that's when we get into the FPPE, which is when I actually have to watch someone perform endoscopy and give more detailed feedback on how they're doing. And so this is the idea that you can't really glean why someone's an outlier without watching their endoscopy, right? And so when I do an FPPE, it would be a long form that basically looks at all the performance measures and how they're doing. And for this, you can bring in a lot of these skills tools that are out there, like for colonoscopy and EGD, there's these ACE tools that we use for trainees that allow us to assess if someone's doing a good job or not in terms of their colon procedures. So this is just a more detail-oriented approach to how someone's doing in their endoscopic procedures. This landmark paper came out in 2013 in the New England Journal of Medicine, which basically said that if I watch a single video of a bariatric surgery procedure that's submitted by the actual surgeon themselves, I can predict who's more likely to have a, which surgeon is more likely to have their surgeries complicated by ED visits, readmissions, and reoperations. So if I submit a video as a bariatric surgeon, it's rated by someone else, they will then figure out are my patients in general, not just that specific video, but my patients in general, more likely to have complications. So that just really highlights how valuable the information of a single procedure, the skill demonstrated can be in terms of determining outcomes for a patient. We've shown this as well. We've shown that, you know, if you watch videos of colonoscopy, and this was us watching five videos, we can predict who's more likely to have a low ADR versus who's more likely to have a high ADR. So the x-axis here is ADR. The y-axis is basically me and some colleagues around the country watching videos of their colonoscopy and rating their skill, and we can pretty much determine who's going to have a high ADR versus low ADR. But this really isn't feasible, and this is why machine learning is really going to transform endoscopy quality, and I want to sort of close with that exciting future. You know, AI is the idea that unsupervised computer algorithms will do specific tasks that require human brain. You'll hear lots of phrases, and, you know, I encourage you, we just had the AI summit for the ASGE a couple months ago in San Francisco, which talked a lot about this, and joining the next one would be great for those who are interested. A lot of different words you'll hear around this, but the idea is that computers are doing what we need them to do to help us measure quality and improve quality. And so, you know, you'll hear a lot about computer-aided polyp detection and classification systems, but I'm going to focus a little bit more about how AI can improve quality assessment as well. You know, AI helps us find polyps. I think everyone knows that now. That's great, but AI can also help us assess skill during procedures. So, in Northwestern, we sort of developed this endoscopic video record where we're basically taking all procedures, ingesting them, and recording them using a cloud-based video recording solution, integrating that into the EHR, and that lets you have this sort of detailed idea that it's not just that I'm distilling someone's procedures, you know, thousand procedures down to ADR. Each of those procedures is a frame of video, and those frames of video are highlights of the important performance that the endoscopist is doing. So, those thousand colonoscopy videos are hours and hours of colonoscopy that are going to maybe tell me why someone's an under or over performer. And, you know, you can sort of see these ideas that, you know, if a computer can watch how well you're cleaning the colon, descending the colon, and examining the colon, then maybe you can actually obviate the need to calculate ADR and actually give skills feedback. This is an AI system we built at Northwestern that is actually doing this, right? So, I'm taking in a colonoscopy. Actually, this is a double. The first half of the video is EGD, and the second half is colonoscopy. And the AI algorithms have now figured out, okay, when's the cecum reached? So, what would the insertion time be? The insertion time is three minutes and nine seconds here. It figures out where polyps were detected, and it can tell me, you know, what my polyp detection rate is. It's figuring out how much I'm cleaning the colon, what my withdrawal time is, what my withdrawal time absent polyp removal is. So, you know, the time I'm spending looking for polyps without the time spent removing the polyps. So, that's one of the limitations of withdrawal time that, you know, TR talked about. You can only calculate it currently on procedures without polyps because otherwise you'd have to subtract this polypectomy time. But this AI tool we built actually does that. So, this is what I call AI augmented procedure review. I use this to measure the quality of colonoscopy for some endoscopists in our system to give them feedback on what they could do better, both from a polyp removal standpoint as well as a polyp detection standpoint. So, I think AI systems are going to be critically important to measure quality metrics and allow us to give feedback and make improvements. And this has been shown in a nice study in gastro a couple years ago as well by another group that shows that AI can, you know, measure things like how well we're distending the colon, bowel preparation, et cetera. So, where do I think we need to go in the future? You know, we need either better registries or novel measures because we're not doing a good enough job of measuring quality at this point. What do I mean by better registries? Well, I think we need to, we're going to focus on outcomes. We need to think about things like risk adjustment, you know, patient demographics, comorbidities, procedure indications, and then also focus on outcomes of interest, right? Maybe not just ADR, but interval colon cancer rates, maybe not just ADR, but also adverse event rates and intelligently pulling them together. Because if we get all this together in an automated or semi-automated way, we'll get that risk-adjusted quality with national benchmarks that really is meaningfully impacting patient quality. So, in summary, there are significant barriers to collection of endoscopy quality metrics. There are multiple automated methods available of varying levels of sophistication. Ultimately, collection of quality metrics by AI may reduce the barriers to data collection. And I really think that a course in the future about AI and quality can really sort of kickstart us on this pathway. So, I appreciate everyone's attention. Thank you.
Video Summary
The video discussed the challenges of automating data collection and analysis in the field of endoscopy. It emphasized the need for quality metrics to improve patient care and outcomes. The limitations of data collection were highlighted, including the requirement for a large volume of procedures to accurately measure procedural quality. The video also mentioned the difficulties in calculating metrics and providing actionable feedback. It discussed the limitations of outcome metrics and the infeasibility of measuring quality in low-volume procedures. The importance of measuring outcomes using automated processes and the use of data registries and warehouses were also mentioned. The video concluded with a discussion on the potential of artificial intelligence (AI) in improving quality assessment and providing feedback. It highlighted the use of AI in polyp detection, classification, and evaluating skills during procedures. The video suggested that AI systems could automate data collection and reduce the barriers to measuring quality metrics in endoscopy. The speaker encouraged further education on AI and its potential impact on improving patient care. No specific credits were mentioned.
Asset Subtitle
Raj Keswani, MD MS
Keywords
automating data collection
endoscopy
quality metrics
patient care
artificial intelligence
improving outcomes
×
Please select your language
1
English