false
Catalog
The Newest Tech for Your Practice | August 2021
Approaching Automation of Data Collection and Data ...
Approaching Automation of Data Collection and Data Analysis
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I think this is a good sort of time to pause and take an audience poll question if we could bring that question up. So here's the question. You have a 50-year-old patient who's female and getting her first screening colonoscopy, although we can update this now to say maybe a 45-year-old female with her first screening colonoscopy. And she asks to see the physician with the highest ADR. If you could all vote, except for the organizers and panelists, what would you tell her? I know who the best ADR is, but I can't share it with you. I know, and I'm allowed to share that information with you, or I don't know, we don't track that. You can apologize. Interesting, I know Anne can share the information, which is very, very impressive. I'm actually surprised, and we can sort of touch base on that as well in the Q&A. But I would have thought, I know, but can't share it would have been the answer. So that's really interesting. I'm going to start my talk now, which is, this is a new talk we're giving for this course. And it sort of stems from a lot of questions that we get from past attendees of a lot of this is great, but how do we do this, right? We already are overstretched thin. I'm just trying to provide clinical care, and now you're asking us to do all these measurements. And so I'm going to give you again, just an idea of what we think about when we think about automating data collection for data analysis. And we have a lot of experts in the panel that can give you their thoughts as well later. But this is just sort of an approach to think about. So what are we going to talk about here? What are the current limits of data collection? And then what options exist for collecting data? Really a simple approach for us to think about as we're trying to measure quality of care. So what are the limitations, right? Why do we not just do this all the time, right? Obviously, besides time, what are our limitations in measuring quality? Well, the fact is, there are limitations of all of our quality metrics. Here's an example of a limitation of adenomatectory. We just saw a great talk about how important ADR is in practice, right? But it is very difficult to measure ADR in practice with what we call high confidence, right? So it takes about 250 colonoscopies per year to confidently be able to assess the quality of colonoscopy being performed. That's because at lower volumes, it's very difficult to say whether an ADR of 27% really is an ADR of 27%, or is it something between 20 and 35% because of the limitations of having a smaller sample size. So it takes about 250, 200 to 300 colonoscopies to get a good ADR measurement, but nearly half of colonoscopists in the United States, especially rural surgeons, where a lot of colonoscopies being performed, they perform fewer than 250 colonoscopies per year. And so that's a real challenge in terms of measuring quality. We just don't have a big enough sample size, right? ADR is also cumbersome to calculate, and it may not be good enough actionable data to actually say, well, your ADR is low, what do I do about it, right? So we don't just see that issue with what's called confidence intervals in adeno retention rates. Here's a really old study showing the variability of EUS fine needle aspiration, how often we're able to get a diagnosis after a biopsy of a pancreas mass. And what these bars show you is the confidence interval. For example, if I look at this clinician here, their ability to get a diagnostic rate is about 85 or 90%, but the confidence of that is between about 50% to 95%. That means that I'm not sure how good they are at getting a diagnosis of pancreas cancer. It's somewhere between 50 and 95%. Clearly not a very helpful thing when you have low procedure numbers, you really can't tell who's an outlier and performer or not. So low volumes makes data collection a very challenging thing. You need to collect data over many years to be able to get a good assessment of the skill that's being displayed. And then we also have a caution about measuring outcomes. Just thinking about it here, this is an example of a study we showed in the past where if I looked at people who just lived near my hospital, I would see adverse event rates of people coming back of 30 to 35% potentially after a high risk procedure like ERCP. If I looked at people who live farther away from my hospital, way over here, my adverse event rates would be lower like 13%. It's not because the people who live farther away are actually doing better, it's because they're going to their local hospital after a procedure if they have an adverse event. So measuring outcomes is a challenge because we really wanna measure outcomes that are seven to 30 days after the procedure, no matter where the patient goes. We can't even just measure sometimes the outcomes that happen at our own institution. And so thinking about this when we try to do automated measurement is really important. Measures do not competently evaluate low volume providers as well. And I'm gonna reiterate what I showed you for abnormal detection rate at EUSFNA. If I looked at the typical person who's performing low volume ERCP, 25 to 50 cases a year, I can measure their ERCP success rates, but I'm gonna have a very wide confidence interval again about how well they're doing. If they do 50 ERCPs of native papillas over two years, which is not an unreasonable number in the United States for native papilla cannulations, I won't really know whether they confidently are meeting the quality metric as proposed by the ASGE of greater than 90%. So it's really tough to measure outcomes in these procedures. And so I'm gonna show you a lot in the subsequent slides about how we can do these automated measurements and what we can do about them, but keep in the background of your head, that's great, except we're still gonna be a little bit challenged. And reporting outcomes is very controversial for that reason, right? This is a ProPublica article that came out a few years ago, telling you where to find a surgeon to remove your gallbladder. And they have the same issue that I'm talking about here. So hopefully you can see here when they're reporting things like outcomes after a laparoscopic gallbladder removal, you can see that when you look at the adjusted complication rates for people who are relatively high volume surgeons, 68 surgeries, all I can tell you is that their complication rate varies somewhere between below average or above average, right? And so that's not a very useful metric, right? So again, thinking about these metrics and when we calculate them, if your provider come, you say, oh, my provider has an adenine protection rate of 13% in January, we need to withdraw their privileges. They only did 20 colonoscopies that month that are screening colonoscopies. That's not an accurate measurement of their ADR, right? And so we need to be able to have a high enough denominator. So also important to understand your definitions when we talk about approaching automated measurement. So the question is who is measuring quality and how they're measuring it. Just like you all showed in that poll at the beginning, 85% of physicians say they're measuring ADR. Though a significant minority did not and they said they did not because they said their practice is too busy to measure ADR, right? And so we gotta make sure that they, we can remove those barriers. But for the people who said they're measuring ADR, they didn't seem to actually know the definition of ADR, right? About half of them knew it was a percentage of screening colonoscopies that reveal at least one adenoma. But 40% actually thought it was for all colonoscopies. It wasn't just screening colonoscopies, right? And so we can talk a lot about whether we're measuring it or not, but unless we know the definition of what we're measuring, it's not very useful. So again, just to reiterate, having enough procedures to measure outcomes is an important limitation for us to know and accurately utilizing the right definition for the measure you're gonna have. For example, for ERCP success rate, cannulation success rate, it's not all ERCPs, it's ones with native papillas. And so having that correct definition before you embark on extracting the data is very important. So again, we talked about this measuring quality and practice is much more difficult than we would imagine. We have a few different approaches. We have manual chart review, using a data registry, using a data warehouse, natural language processing, or just watching. And I'm gonna talk a little bit about all of these to give you an idea of what we can do in practice. So manual chart review, this is a good example of what manual chart review used to be like. Now a better picture would actually just be staring at a screen with a blank look in your eyes, right? It's basically just going through each chart, putting down some data about what you found in that procedure, and then doing enough of them so that you can get an outcome metric. This requires very little IT infrastructure, usually just requires you staring at your electronic health record and pulling some data, but it's very time consuming. And again, it requires a large number of calculations to ensure a confident prediction of things like an ADR. So it's also something that's not reproducible, right? If I put in four hours to calculate all my clinician's ADR, I'm in March, and then when July comes around and I need to do it again, I'm gonna have to put another four hours in altogether. None of the work I did in March is gonna help me in July. I have to just do everything again. So it's a very sort of resource intensive thing to do. And it's really not our preferred approach in general. So when should you use manual chart review to calculate outcome metrics? Well, I think it's helpful if you wanna confirm a concerning automated review. For example, we do a lot of automated ADR measurement at Northwestern, but I saw some numbers that weren't fitting with what I thought they should be. So I went to the actual charts, looked at whether they were appropriately using the right numerator denominator, made some adjustments to help fix it. So it's very helpful to obviously can spot check things. And then if something can't be calculated, sometimes you need to use manual review. But as EHRs take over our lives, it's less and less common something can't be calculated. This is an example of doing sort of manual review. We had done some studies showing that graduating advanced endoscopy trainees were not meeting the quality metrics that we thought they should before going to independent practice. And so we actually had all of them measure their outcome metrics in the first year of training for both EUS and ERCP. And we're grateful to find out that they actually were meeting these metrics of being able to biopsy masses during EUS, having low rates of adverse events, having high rates of ERCP deep cannulation success rates. All the things that we were a little worried that they weren't meeting, we were able to do with manual review. They were manually entering their data and allowing us to see that they're meeting these metrics. But that's not a long-term feasible option, right? They weren't gonna do that every year. They just did that as part of this ongoing study. So helpful, but not our end result or end goal. So what about registries? Registries, the one I wanna sort of highlight here is GI-Quick, which I think all of you should hopefully have heard of already. This is a partnership of the ASGE and ACG. This allows us to compare facilities and physician performance to peers, provides immediate feedback with respect to benchmarking, and it's compatible with essentially all endoscopy report writers, right? And so this is important for CMS quality reporting or the Merit-Based Incentive Payment System, or MIPS. And this can allow you to be, can be utilized for that as well. And so what is GI-Quick really? Again, it's just a registry, predominantly of gastroenterologists in the United States who want to compare their performance on core procedures of colonoscopy and EGD with peers. So here would be some example measures of what GI-Quick tracks. ADR, adequacy of bowel preparation, things like photodocumentation, withdrawal time. These will be tracked for your clinicians using GI-Quick. And then you can say, how do I compare with people around the country? Similarly, there's measures for EGD that are tracked, things like appropriate specimen acquisition and Barrett's esophagus, history and physical documentation, indication documentation, things that you might want to track and then compare with your peers. And this is what it could look like. You'll get your site's ADRs here in the pinkish bar. And then all the people, all the sites that are sort of nationally seen, you can see in the bluish bar here and you can have the benchmarks that you'd want, right? Your male ADR goal, your female ADR goal, things like that. So the issue here is for a lot of us, adenomyosectomy rate is not automated as the pathology is not pushed to the central repository. And we need to figure out a way to do that on the local end, right? These things are becoming more and more sophisticated. And I think in a future course, having an entire talk on GI-Quick and how it could be intelligently utilized makes sense. But understanding that it's a really useful registry, but unless it's directly connected to your pathology database, there's still some backend work that needs to be done. There are EHR solutions that are coming out as well for tracking quality metrics. I was required by the company to put this copy right there at the bottom left of the screen. This is an example of how an EHR, which has all of your pathology reports and has your procedure reports within it, can really do a pretty good job of tracking your quality metrics. This is a dashboard of quality metrics that is being developed by Epic. You can see here, they're tracking ADR for your entire institution. They can benchmark it versus peer institutions around the country or regionally, how often you have an on-time start, how often you're injecting polyps, so not just adenomas, but polyps of any sort. You can do a PDR. What's your average withdrawal time? What's your CEQA intubation rate? What's your bioprep adequacy? What's your procedure volume? So all of these can be tracked both on an individual clinician level as well as for your entire institution, and then benchmarked, again, within your institution and outside of your institution. And so there's a big advantage of having all of your data consolidated together so that automated collection of outcome metrics can be performed. But obviously, there's a cost to implementing some of these, and that may be prohibitive. But understanding that EHRs really are interested in the idea of how they can take all your data and package it back to you to measure these things that you care about, such as the quality metrics, both outcome metrics that we care about, but also things like on-time starts and things of that sort that are efficiency metrics that we find very important as well. So what about using a data warehouse, right? So a data warehouse is the idea that we can integrate data from multiple resources, so our endoscopy reporting system, our pathology system, our EHR, bring it into one big sort of conglomeration of data, and then start to do the analysis and analyses that we need to do. So in our question, we have a lot of disparate EHR solutions and they need to be brought together, and bringing them together is that data warehouse idea, and then we can do the analysis that you need to do. So this is an example of how we have a data warehouse report for calculating automated detection, automated adjuvant section rates by provider. So I, a few days ago, I went to it and I said, Northwestern EDW, I wanna calculate adjuvant section rates for all of my providers at Northwestern in the entire system of Northwestern over a four-year period from February 28th to 17 to 2021. And then I can say, I now have very confident adjuvant section rates, like this provider over here did 3,322 total colonoscopies, 1,880 screening cases over that time period. Their average insertion time was 6.9 minutes, average withdrawal time in procedures without adjuvant removed was 12 minutes. Their ADR was 46%, and our lab's adjuvant section rate over the entire time period of that four years was 40%, right? We also have serrated polyp detection rates. We have, you know, things like how often we find an adjuvant or serrated polyp, about 45% of our screening cases we do. So that sort of ability to automate that process by utilizing the data warehouse allows me to do something that is just impossible otherwise. To find someone to manually calculate an ADR over four years for all of our clinicians is beyond a full-time job, right? And so sort of putting in that work upfront of these things like a data warehouse can make it really helpful for us to be able to do things like calculate things that are more sophisticated and over longer period of times. It avoids us doing these silly things like, let's check ADR for physician X in the month of January and spot check 30 procedures, because that just doesn't give us very useful data. This is another example of what we can do with an ADR. This is a project we did last year called the Upper GI Bleed Project, right? So we were trying to, you know, think about the things that you'll hear about later, how we can improve the care of upper GI bleeds. And so these are sort of dynamic dashboards you can create with that data there. You can see here, I can change to which of the many hospitals that I want to evaluate. So I'll just choose Northwestern Memorial Hospital in our system. How often are PPIs administered in a timely manner? How often are inappropriate transfusions being performed? How often are we doing things like our creotide in a timely manner for various of bleeds? I can look at the number of encounters that are there, the observed to expected length of stay. I can again do all, I can change the time period where I'm going to do these metrics. So again, having the resources to do things in an automated fashion really gives you these lovely dashboards, allows us to do a lot more and think about quality in a more dynamic way, rather than saying, oh, we want to do an Upper GI Bleed Project. Okay, who's going to look at the 500 charts to see how we did last year for upper GI bleeds? So think about these things, solutions, when you're trying to develop your metrics in the future. Natural language processing, I just want to explain what that is. I'm sure most of you know, this is a form of artificial intelligence. It allows computers to read. So we see a lot of AI NLP in our daily life. Spam filters, they are pretty good at NLP and they can tell you when something seems to be spam and put them in that folder. Things like Alexa and Siri, they use NLP. Amazon reviews NLP, Amazon uses NLP to try to figure out which are fake reviews so that they could filter them out. You can also use NLP to measure quality, right? Because it allows us to do more complex calculations. We showed this many years ago and others have similarly shown these sort of work. NLP can look at procedure reports, ingest all the information, figure out if it's a screening or surveillance colonoscopy, analyze your pathology report and do things like give you adenoma detection rates. Can also let you do sort of things like figuring out if an adenoma is in the left colon or the right colon, which may be important to you. Things like this can make it so that you're quote unquote manually reviewing charts, but it's actually a computer manually reviewing the charts. So taking some of the pain and sharing it with our machine friends, right? So you'll hear more and more about how NLP can potentially help you measure quality in your practice as well. There is examples of NLP in existing endorider solutions. In this solution, pathology reports are sent back to an endoscopy report writer. That endorider uses NLP to determine whether an adenoma has been found. And then ADR is an automatically calculated sort of bringing in that information with the pathology procedure report. So this is an example using probation, which is a GI specific endorider. Basically a specimen is documented during the procedure report. That specimen is sent to the pathology lab as you know, the specimen is analyzed by the pathologist. Obviously the pathology report is then sent back to the endoscopy report writer. And that report for the pathology report is analyzed using NLP. And the indications for the procedure are analyzed by the endorider. And you can then say, I now have a confident assessment of the denominator, which is the number of screening colonoscopies and the numerator, which is the number of screening colonoscopies with an adenoma. So this is a solution that is a sort of endorider specific solution that you can think about that utilizes NLP to help you measure quality of procedures. Just touching on the fact that, we talked a lot about how outcome metrics are a bit of a limited field, right? We can't just measure things like, pancreatitis rates after ERCP, if we only do 50 ERCPs a year, we don't know if we have an outlier or not. So this is where I think watching becomes more helpful. And the question is, who's doing the watching? If you have someone you feel is an outlier in quality, don't just wait till they do 200 procedures to do it. We have things like professional practice evaluations that can help us measure outcomes, right? So these are, the OPPE is a screening tool to evaluate practitioners who have been granted privileges and identify clinicians who might be delivering unacceptable quality of care. And as with all screening tests, we gotta then, if we have a positive finding, follow it up with a more specific diagnostic test, which is called a FPPE. So for our example, the OPPE is an ADR, right? If the ADR is low on OPPE, I then may do a FPPE, which is this example. It's a follow-up process to determine the validity of any positives found through OPPE. And we basically then just evaluate the provider. This is an example at Northwestern Hospital. If I have a provider that I think is performing suboptimally, I can do an FPPE, which actually means we're watching the procedure and determining, utilizing some basic things like, what's the scope-scaring technique? What's the fine motor control? How are the polyps being removed? Are they looking behind the folds of the colon? Are we cleaning the colon well? All of the things that we know are essential to high-quality colonoscopy. Is that being performed during this procedure? So if you're worried about quality of care, don't just wait for the outcomes metrics. Maybe just watching the procedure can give you that information. And surgeons have shown this for a while. They've shown that if you just watch one surgical video, you can confidently predict whether or not that patient's more likely to end up in the emergency department, be readmitted, or have a repeat operation. So just a single video can really help you understand the quality of care that's being delivered. We've similarly shown this, that if we watch just five colonoscopy videos, we can actually predict what a clinician's ADR is with a pretty good success rate, right? So that information of just watching can be very valuable if you're having trouble with actually procuring some of the data you need through measuring outcome metrics. But I think this is all gonna shift because it's still not fun for us as humans to be watching. Right, and so machines are starting to watch us more and more, and they're gonna help us improve the endoscopy quality, right? They're gonna do a lot of the stuff we talked about, right? So artificial intelligence, which I would again advocate to Eden, I'm sure Asma would agree, would be a great topic for a future talk in our QI meeting. But this is the idea that a unsupervised computer algorithm can do specific tasks that traditionally require a human brain, right? In advanced and deep learning have led to a revolution in both computer-aided polyp detection and polyp classification, but they can also be utilized to measure quality. And what do I mean by that, right? So we know that computers can help us identify polyps during colonoscopy, but there's been some nice work that they can show also what's the skill of the colonoscopy being performed, right? And just skipping ahead to what I mean by this is we know that there are an essential colonoscopy skills. There's cleaning the colon, distending the colon, how well you examine the folds behind the colon. All of that takes time, right? But we know that we can have computers watch our videos and actually tell us how well is someone cleaning the colon, distending the colon, and examining behind folds. And so what I showed you here is we could actually combine now the endoscopy report writing system, storing the videos and the electronic health record, and really create a more composite way of the quality of care that's being delivered, not focusing on just whether an abnormal is found, but if you use the actual full colonoscopy video, you can potentially measure the skill of the colonoscopy as well. And that may be the best measure of colonoscopy that we can get. And that's been shown, you know, this is stuff that we're working on, showing that you can take a colonoscopy video, find simple quality metrics, things like retroflexion. You can automatically obviously identify that. The computer can figure that out. The computer can figure out when a pulp is being detected. They can figure out where the appendicitis orifice is. So, you know, we can create homegrown ways to do this, but there are many commercial methods that are out there that we can talk about that are going to be coming out to do things like SICO innovation rates, pulp detection rates, whether, you know, retroflexion is there, what the withdrawal time is, all these sort of things. A really nice gastro article came out last year showing how we can use artificial intelligence-based analytics. During colonoscopy, to do things like measure how well you distend the colon, clean the colon, identify polyps, all of these sort of things. So this entire talk is going to shift over the next five years, really leveraging machine learning to measure quality and improve the quality of care that's delivered. So it's something for you to have in the back of your head that hopefully better solutions are in sight in the future. So where do we need to go in the future? I think we need to either have better registries, better ways we collect the data, or, as I showed you with some of the machine learning work, novel measures, right, and better ways to do this because it's very challenging to do it currently. If we're going to focus on the outcome of a registry option, we need to make sure that our outcomes are risk-adjusted. You know, if we're going to be measuring bowel preparation quality but not accounting for the fact that 90% of our population uses English as a second language or has a very low health literacy, that's really not fair to our providers. And we need to think about outcomes, not just immediate success, but delayed success rates and delayed adverse event rates. And we can potentially then think about how all those things can be automated or semi-automated into a transfer to a registry, and we can have those sort of risk-adjusted quality with national benchmarks. So that's really where things like GI Quick, the EHR solutions, and the endoriders are trying to get to. Take all of our data and do it for us. Make sure we never have to be doing manual chart review again in the future. So hopefully in this sort of brief talk, I've shown you that there are significant barriers to collection of endoscopy quality metrics. However, there are multiple automated methods available of varying levels of sophistication. But ultimately, quality of quality metrics by artificial intelligence, I think will reduce the barriers to data collection. Thank you all.
Video Summary
In this video, the speaker discusses the challenges and methods of collecting endoscopy quality metrics. They mention the use of manual chart review, data registries, data warehouses, natural language processing (NLP), and artificial intelligence (AI) as potential solutions. The speaker emphasizes the importance of accurate definitions and denominator calculations when measuring quality metrics. They also discuss the limitations of outcome metrics and the need for risk adjustment and benchmarking. The speaker suggests that watching procedure videos and utilizing AI can help improve quality measurement. They conclude by stating that the future of quality metrics lies in automated methods, such as AI, which can alleviate the burden of manual data collection.
Asset Subtitle
Rajesh Keswani, MD, MS
Keywords
endoscopy quality metrics
manual chart review
data registries
natural language processing
artificial intelligence
outcome metrics
automated methods
×
Please select your language
1
English