Machine learning models in healthcare: leveraging user feedback and improving member experiences

blog_bannerLG_built-in.jpg

A shortened version of this story was originally published on Built In NYC and is republished with permission and edited. 

The promise of AI, machine learning, and predictive analytics is to transform businesses, but not all predictive models actually cut costs, increase revenue or improve customer and employee experiences. 

Successfully deploying — and monitoring — machine learning (ML) solutions requires businesses to experiment with processes and tools while building upon industry best practices.

At Pager, user input plays a vital role in how models are productionized. We spoke with Data Engineer Jaime Ignacio Castro Ricardo to understand how proof-of-concept feedback is integrated into models and why users take notice.

“Users are usually happy when models are deployed since they were involved in feature planning, discussion and reviews,” Ricardo said.

What tools have you found to be most effective for productionizing Pager’s ML models?

Like most of the industry, we use Python and Jupyter Notebooks for exploratory data analysis and model development. We recently switched from self-hosting Jupyter Notebooks to using Google Colab since much of our tech stack is already on the Google Cloud Platform. Colab offers an easy medium for collaboration between team members.

We deal primarily with chatbots, so our ML stack is geared toward natural language processing (NLP). We use SciKit-learn, spaCy, and Rasa as the main ML and NLP libraries to build our models. There’s also an in-house framework we developed around them to streamline our experimentation and deployment process.

The engineering department integrated GitOps into our continuous integration and delivery pipelines. New versions of our models are Dockerized and deployed to a production Kubernetes cluster when we merge into master by Google Cloud Build.

User, client, and clinical input gets used to design product features that leverage our models.

What are your best practices for deploying a machine learning model to production?

We perform thorough unit testing with pytest, and exhaustive integration and user testing in multiple lower-level environments. User, client, and clinical input gets used to design product features that leverage our models. 

Then we iterate on proof-of-concept feedback from users. We also quantify how new ML models affect efficiency and productivity to gauge their real-world effectiveness.

Because of these practices, our ML team rarely introduces bugs in production. Users are usually happy when models are deployed since they were involved in feature planning, discussion and reviews. Additional training and input helps users be more efficient when using the features in production. 

How do you work with users and other teams to incorporate feedback into your models?

Typically, project and engineering meet and they define the acceptance criteria. Product teams get the requirements needed from the users – such as the command center nurses, for example. Once we build the minimum viable product (MVP) of the feature, we start deploying it on our testing and training environments. We want users to test the feature before launch to get feedback and make sure the feature will be used.

A recent example is when we changed the engine that powers our clinical bots. It was a very big project, and there were a lot of unknowns, because we were leveraging new technology. We spent a lot of time with the nurses and training team, who gave us feedback on a variety of aspects: when certain things should happen, adding an alert message right when the bot stops interacting with a patient, and adding the capability to resume a bot and not have it start from the beginning, to name a few. They helped us make sure the bots used the right language, which wasn’t too colloquial or too clinical. 

This all happened before we launched to production. There was a lot of back and forth after we got all the requirements. So this served as the testing ground, and we were able to iterate on it and launch the refined clinical bots into production.

Can you give an example of when your team was able to quickly deploy a new ML model? 

The development of our COVID-19 bot was very responsive to our client’s needs at the start of the pandemic. It was the first weekend of the pandemic and our client had an overwhelming volume of people in the app asking about COVID-19, so they needed a way to filter those members and send them to a different location than the people who wanted nurse care unrelated to COVID-19. This meant we needed to expedite that initial intake of patients and we did that using our bot system. We modified it so we could ask the questions we wanted, and based on the responses, we could route them to the appropriate queue. 

 
The graph shows the utilization of the COVID-19 chat bot as a percentage of total member chats.

The graph shows the utilization of the COVID-19 chat bot as a percentage of total member chats.

 

It was a Saturday morning, and by Saturday evening, we were deploying to production. There was collaboration between many teams and we were in constant communication with the client. We built the solution, we deployed to testing environments, tested, and showed it to the client. They provided feedback, we made updates, and by the end of the day we had a whole new system from scratch that was helping their command center handle the high volume of chats that were coming in.

What advice do you have for data scientists looking to better productionize ML models?

Always keep track of experiments with versioning; not just for trained models but also for the input data, hyperparameters and results. Such metadata proves useful when we develop new models for the same problem and reproduce old ones for comparison and benchmarking. We use an in-house framework for developing new models, but MLflow is a great open-source solution as well.

Looking forward, how do you hope that Pager, or other healthcare companies, could further leverage ML models to improve care delivery or the member experience?

In general, there are two ways to make any of these models better. You can use brute force, make the model more sophisticated, and put a lot of computer power into trying to crack a pattern. Or, you get more data. To me, if you get more data, you can build a pattern or a relationship much better than putting computer power behind an algorithm to try to make it work.

In healthcare, since it’s so difficult to share data, and because there is a lot of regulation surrounding the use of personal data, it can be very difficult to get the data needed to train ML models. So going forward, making it easier, more safe, and more reliable to have access to de-identified or anonymized data can help us build stronger and more accurate ML models.

One project we’re planning in partnership with a client is working to improve the efficiency and accuracy of our clinical bots by leveraging clinical data. Currently, we receive demographic and clinical information about a patient, which we show in the command center to the nurse or care coordinator assisting them. 

We’d like to start to feed that information to our clinical bots so they can provide a better assessment or ask a patient more targeted questions. For example, perhaps an existing question in the bot script is whether a patient has diabetes or asthma. If the bot already knows whether they have a condition, it won’t need to ask that – so it can make triage times a little faster and the experience a little more personalized. This way, we can personalize the clinical bots to each individual using existing data from the client’s health records, to ultimately improve the experience for both the patients and the users in the command center. 


A condensed version of this interview was originally published by Built In NYC on July 16, 2020 here.

Previous
Previous

When work perks go remote: How we support employees in a virtual environment

Next
Next

Virtual care is on the rise; here's how we are helping handle the influx.