false
Catalog
ASGE Annual Postgraduate Course: Clinical Challeng ...
AI in predictive modeling: Complex EHR Data
AI in predictive modeling: Complex EHR Data
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
We'll move on now directly to TAPOCHE as a follow-up. We've been discussing mainly computer vision and endoscopy this morning, but, you know, another big role that it plays is on looking at big data sets and predictive modeling. And TAPOCHE, again, welcome again, and we look forward to your presentation on this topic. Hi, everyone. I am TAPOCHE Roy, senior manager at Kaiser Permanente and the author of Notes on Medical Image Processing with Deep Learning. Financial disclosures, I have a financial relationship with Kaiser Permanente in the form of salary. A few other things, as I said, I'm the author for the Notes on Medical Image Processing with Deep Learning book. At the same time, I serve in KDD Knowledge Discovering Data Mining Conference as senior committee member, and I'm the competition chair for multi-data set time series anomaly detection, along with Professor Imod Kayag. In today's talk, we are going to talk about electronic health record, predictive modeling and concepts used for AI in EHR, and ideas on general solution and opportunities in this area. Electronic health record, what do we see or imagine when we talk about electronic health record? So, as clinicians, my understanding is when you talk about EHR or EMR, you think of a screen that is similar to your hyperspace for Epic. For IT professionals, they think of it as a collection of variety of data sources and how they are related to each other, like schema. For data scientists, they think of it as patterns in data and how things are interrelated with each other. What is the goal for AI in EHR? In my mind, the first goal for AI in EHR is to provide insights, and these insights can be both at the individual level as well as the population level. However, there is a second goal of AI, and we can talk about that and discuss that in the end of this talk. AI needs data. The data collected today is heterogeneous. There is structured text, there is unstructured data, there is image data, there is video data, and then there is audio data as well. So, there is variety of data in healthcare. That is why some people say that healthcare data is complex. Complexity can be loosely defined to mean variety of things. The definition is not very concrete. However, there is Dr. Kannapalli and his team who came up with a framework, basically, which talks about how to look at complexity in healthcare. There is a general thought about complexity that was provided by Nobel Laureate Dr. Gell-Mann, which says that a variety of different measures would be required to capture all our intuitive ideas about what is meant by complexity, and thereby, what is meant by simplicity. It is pretty, pretty fundamental in understanding complexity and how do you make complexity into multiple simple components. Complexity in healthcare system can be, can be two parts. The idea of interrelatedness among the components of a system can be used as a measure of complexity and the functional decomposition of this as a mechanism of studying meaningful subcomponents of a complexity and can be used as a framework for understanding complex healthcare systems. So, this is the framework provided the link of the paper by Dr. Thomas Kannapalli and team. So, let's talk about predictive modeling and then concepts of using EHR data analysis. I'm going to start with a life cycle just to understand how the life cycle of predictive modeling works in general. Everything starts with real data. You take the real data, data that is cleaned, then feature engineered, data set that's created. Both training and test data sets are created. You develop a model with it. You deploy the model for scoring. You verify the results. You put it in the real world. You keep on monitoring it and see the performance. And then you take some actions on it. So, this is a generalized flow of predictive modeling, both the classification and regression model types. And we'll talk about that in a second now. So, what are the paradigms in machine learning? At the highest level, we can say that there are three general paradigms in machine learning. There is unsupervised learning, there is supervised learning, and then there is reinforcement learning. Unsupervised learning is learning patterns in the data without tagging them or looking at it prior. So, an example is anomaly detection or clustering. There is supervised learning. There is supervised learning that talks about using the past information and then training the model with that and then predicting the future with it, classification and regression are the ones. Machine learning, on the other hand, is new to EHR, where we are training machine learning models to make sequence of decisions and adjust actions based on the rewards and penalties. So, the general models that are used today in EHR are classification, regression, and clustering. They are the general ones that are being used. Out of these, classification and regression are the ones that are very commonly used these days. So, what is classification? Classification consists of predicting input data into a given set of categories or classes or labels using an algorithm trained with prior information. This is, of course, supervised learning. And we will take a very simplistic example so that we understand the concept. So, let us see how we humans understand it. So, when we look at, for example, you are looking at a set of cows, herd of cows and a few dogs as shown in the picture. How do we humans understand the difference between a cow and a dog? We just look at it. We look at their features, characteristics, and then we understand. So, let us dive down a little bit. So, we look at the height. The cow is generally taller than a dog, I mean, except there will be an anomaly there. There are horns. Most cows have horns. Dogs do not have horns. There is facial difference. The tails for the dogs and the cows are different. A cow's tail has a frayed end of a line where the strands have come unlaid. This is true for, not true for most of the dog tails. Then we collect this information and we put it into a dataset. And then we label our target column, which is what is the animal, cow or a dog. So, we create this dataset, we label it. Now once we have done with that, we basically train this model to detect cow and a dog. And as part of the training the model, we run different algorithms and create a learning algorithm with this. This process is called training in machine learning. Then we test the model with the real world data. So, we test the model with new information, we convert the new information into the set of features, and then our model predicts accurately whether it is a cow or a dog. So, this is an example of supervised learning. Now bringing it back to gastroenterology. Classification models that localizes and identify polyps in real time using deep learning in machine learning. That is a good example of identification of different types of issues or different types of things that we see in gastroenterology. There are a few other examples where we have done endoscopic analysis of lesions to detect whether something is cancerous or not. So, that is a Boolean label. Is it cancer? Yes, cancer? No, kind of thing. Then the other classification problem where outcome, whether the patient has pancreatic cancer or not, such kind of a classification of outcomes belongs to the classification problems. So, I have provided some information based on initial research that I have done. The next type of problem is the regression problem as well. The regression is predicting a quantity as opposed to predicting a label like the classification. So, it is also an example of supervised learning. One of the examples that I can give from a layman's term is predicting the value of a house based on location, number of bedrooms and other information, zip code, etc. So, for example, let us take an example of this regression problem. Suppose we are in a market to buy a house and we want to ballpark the price of the house. And you know there are a lot of possible options to select and we want to see what is the ballpark price that we have to pay if you want to buy a house that we like. So, we list down our preferences. We list down the number of bedrooms, number of bathrooms, the living area, square feet that we would like, how big of a lot should it be, how many floors, we needed a waterfront, what kind of a view we need from the house. So, we list all these things and we create a data set. We create a data set, we probably add information about zip code as well and then we find the past information about the target price and we label that. Once we start, we have created information, created the data for it. Then we pass this input data along with the output, which is our price variable information to learning algorithm, which creates an equation to predict the price. In the simplest regression model, there is a small equation, which is used to predict the price. This process is called training the machine learning model. The difference here is the price can take multiple values as opposed to the two values in our prior classification problem. Now we test the machine learning model that we developed with new data. Of course, we won't provide the new data, the price information about the house. So once this is done, we can use this model to predict the price for the next year, what the price is going to be like. This process where we predict a continuous variable is called a regression. So fundamentally speaking, regression is predicting a quantity, for example, a risk score. So how would we bring this to gastroenterology? So an example that I found from the literature survey are machine learning models to predict early recoverance of esophageal cancer surgery, reoccurrence of esophageal cancer surgery. So this approach basically is used to provide time. I know there was an experiment done by the Swedish, using Swedish electronic health records to find the predictive probability of cervical cancer. So that's also a probability that is a continuous feature we say probability is 50%, 20%, 30%, whatever it is. And these are the sources that I have used. In addition to that, I would like to leave the audience with a little more information about clustering and reinforcement learning. Clustering is segmenting. So an example that I like to give here is a kid's room here and it's a big mess. And people with kids probably understand this. And suppose you were to now classify these things. You want to classify the Legos and the books and the soft toys from each other. How would you do that? So you create a data set with a list of toys in the books. And then you would pass this data to a different kind of machine learning algorithm. And this is an unsupervised algorithm. And this unsupervised algorithm will look at the data and then segregate the data based on the types that are similar. So in here, this is the input data set that we've created. We have ID, we have elements, which are bricks, plates, books, etc., type, which is a Lego, Play-Doh, stuffed animal, pen, book, and the location where it should go. Then we, of course, normalize the data for the machines to understand it better. And then we pass it through a different type of unsupervised learning algorithm, which creates clusters of this information. And then this clusters of this information can be used to create this classification sets. So that's an example of unsupervised clustering, which we'll talk a little bit more in the next few slides. Finally, I wanted to talk about reinforcement learning. Reinforcement learning is an area that's getting more and more popular these days. And this is because of the ability to predict sequences. And good example of this is how humans learn to walk and talk and how can we replicate that. So we stand up, we take a few steps, we fall, we stand up, we take a few steps, we fall again. We keep on doing that till we learn not to fall. So every time we fall, that's a penalty. Every time we stand up and don't fall, that's a reward. We use this information to train a robot exactly like that. So we create a reward function, which could mean looking at a certain amount of movement using a GPS or a gyroscope measurements for certain thresholds. And then second, we would have a robot move with each activator or motor in random ways. And third, we would collect both this information through the sensors and create a reward system. So machines understand this reward systems and then learn to walk. And once they understand this reward system, we have created an equation, which the machine uses, the robot uses to walk, collect the information and walk. So this is a very common example of reinforcement learning. Reinforcement learning has not been used much right now in healthcare, but there is a lot of scope of using reinforcement learning and simulation in healthcare as well. So this is an example that robots will play soccer at some point in time. Once you have trained it sufficiently well. Now summing it up, ideas on general solution. In general, EHR data, as I started with, is heterogeneous. EHR data has a lot of things in it. So the first thing that we can do in my mind is to propose unsupervised machine learning where we can create cohorts, different types of cohorts and create a cohort manager. Then we select the cohorts, get all heterogeneous data, get it longitudinally data so that we have a temporal sequence of events that have occurred in the past. And we create an embedding out of it. And embedding is a huge matrix and a fancy term and you create a mathematical matrix about it. And then identify the use cases that you want to do for prediction. So this is one exam, one pattern of solution for classification and regression types of models. However, I want to talk a little bit more about opportunities of the current state. Today, I feel there is a less organized way in which we are doing machine learning. There is an issue of fairness and bias. There is less transparency in the models, explainability and trustability. So these are the opportunities for the current state. And there are companies that can focus on this to improve these things. And finally, I wanted to end with a EHR solution that we had put together as part of our work in progress paper, which basically takes everything from a cohort manager, passes it to a lineage mapper which turns into a lineage, gets all the information and finally visualizes it so that can be used for population view as well as individual view. I've put together some references here. And if you have any other questions, feel free to reach out to me. Thank you for listening.
Video Summary
In this video, TAPOCHE Roy, senior manager at Kaiser Permanente and author of "Notes on Medical Image Processing with Deep Learning," discusses the role of computer vision and endoscopy in analyzing big data sets and predictive modeling. He explains that electronic health records (EHR) are viewed differently by clinicians, IT professionals, and data scientists, with AI in EHR aiming to provide insights at both the individual and population levels. Roy emphasizes the complexity of healthcare data, referencing frameworks proposed by Dr. Thomas Kannapalli and Dr. Gell-Mann. He then discusses the life cycle of predictive modeling, including data cleaning, feature engineering, model development, testing, and deployment. Roy explains three paradigms in machine learning: unsupervised learning, supervised learning, and reinforcement learning. He provides explanations and examples of classification, regression, clustering, and reinforcement learning techniques in the context of healthcare. Roy concludes by highlighting opportunities for improvement in machine learning, including fairness, bias, transparency, explainability, and trustability. He also presents an EHR solution that integrates cohort management, lineage mapping, and visualization. The video provides various sources and encourages viewers to reach out for further questions.
Asset Subtitle
Taposh Roy
Keywords
computer vision
predictive modeling
electronic health records
unsupervised learning
reinforcement learning
×
Please select your language
1
English