An overview of model monitoring, why it’s important, how it relates to ML observability and what to look for in a ML monitoring solution.
From Monitoring To Observability
We’ve said it before, and we’ll repeat it: taking a model from research to production is hard. There are many failure modes a model can encounter at any given time, and monitoring your model’s performance in production is the first step to gaining confidence in your models as you move to online environments. Machine learning monitoring is a component of ML observability, a tool used to monitor, troubleshoot, and explain your models in production to drill down to the why behind the decisions your models make in production.
ML observability interacts with every part of the model building lifecycle. From experimentation to production, observability helps troubleshoot problems with data ingestion, training/validation, and serving workflows. While bringing a model online can be challenging, ML observability helps alleviate the burden of troubleshooting issues during the model building and serving process to ensure the highest quality model is served in production.
What is ML Model Monitoring?
Before we get to the why, we must first surface the what, so, what is model monitoring exactly? Machine learning model monitoring is a series of techniques deployed to measure key model performance metrics better and understand when issues arise in machine learning models. Areas of focus include model drift, model performance, and data quality. Model monitoring empowers ML Engineers to quickly detect issues and pinpoint where to begin further analysis for quick RCA.
Model explainability is an additional layer to model monitoring to uncover feature importance in a model and evaluate the why behind specific predictions.
Why Do You Need ML Monitoring in Your ML Toolbox?
Why is it important to quickly detect when issues emerge with a model? Well, that’s simple. It all boils down to time and money. Poorly performing models can negatively impact your customer’s experiences, reduce revenue, start PR disasters, perpetuate systemic bias, and much more. The quicker you are able to surface issues, the faster you can drill down to resolve them.
Where did it come from?
Models in production do not come without issues. While research environments can be controlled, when served – models quickly become far more complex when met with the real world. Some challenges include constant data changes, rewriting the model in a new language, or pushing features into a store. Any minute change can jeopardize a model. Yet as it stands, most ML teams don’t know what’s wrong with their models until it’s too late.
Why is it important?
Models can gradually decay over time, digest outliers, over/under-index for specific cohorts, and significantly change business impact metrics.
Tl;dr models can have many problems once they’re in production environments.
Despite the multitude of problems a model can undergo, most ML teams of all sizes lack specific tooling to better understand their model’s performance in production and only know something’s wrong after a series of customer complaints or a sneaking suspicion. While some organizations can afford large teams to help uncover pointed issues, engineers still lack a streamlined solution to act as a guardrail on deployed ML models.
So, what now? Well, in comes model monitoring. With the proper monitoring tool, ML engineers should be able to automatically surface issues with proactive monitoring to understand your change in distribution over time better, quickly map fast actuals efficiently and intelligently across all model environments, come up with proxy metrics to estimate actuals if nonexistent, and go beyond performance metrics at a global, cohort, and local level to holistically address model performance.
The Three Components of Monitoring:
Let’s break this down a little bit more. Below, you’ll find the three main components of model monitoring and what to look for when evaluating if a monitoring tool is up to par.
Performance analysis monitoring Daily or hourly checks on model performance such as accuracy above 80%, RMSE, accuracy above trainingJump to section
Drift monitoring Distribution comparisons, numeric or categorical on features, predictions, and actualsJump to section
Data quality monitoring Real-time data checks w/ features, predictions, and actualsJump to section
ML Monitoring Resources
Model Monitoring for Performance Analysis
Model performance analysis should move beyond the one-dimensional view of accuracy. While accuracy is an important metric, it often does not tell the whole story, and worse, can mask underlying issues that affect model performance. To truly surface up where a model is underperforming, it’s important to look at the performance of models across various cohorts and slices of predictions. Performance analysis metrics include: MAPE or MAE, Recall, Precision, F1 used based on model type.
What to Look For:
Look for monitoring solutions with performance metrics at a global and cohort level to automatically surface lower performing segments at any evaluation dimension, granularity down to the hourly level, monitor specific slices of predictions, detect when false positives or negatives surpass a particular threshold, create proxy metrics when actuals aren’t available, and indicates when your model needs to be retrained.
Model Monitoring for Drift
While models naturally gradually decay over time, there are also hard failures associated with data drift and model drift that can negatively impact a model’s overall performance. Models aren’t stagnant, and it’s important to ensure your model is relevant. Measure drift to identify if your models have grown stale, you have data quality issues, or if there are adversarial inputs in your model. Drift analysis techniques include PSI, K-L Divergence, Wasserstein’s Distance, and more.
What to Look For:
With proactive monitoring, detecting drift should be easy with automatic alerts.Bulk create monitors with multiple baselines, view feature performance at a glance, access a historical view of your drift, and access the distribution view associated with your PSI measure.
Model Monitoring For Data Quality
Machine learning models are not static. They are trained on data and depend on data. So, when there are problems with your data – it’s important to immediately surface data quality issues to identify how your data quality maps to your model’s performance. As data flows through the inputs of a model, monitoring is used to ensure the highest possible quality data is ingested and digested to prevent model performance degradation. Utilize data quality monitoring to analyze hard failures in your data quality pipeline, such as missing data or cardinality shifts.
What to Look For:
In an effective monitoring system, data quality monitors should look upstream and calibrate to exact parameters for any model, any version, and any dimension to quickly catch where features, predictions, and actuals don’t conform with expected ranges.
With the right monitoring tool, you should easily create dashboards to deeply analyze and troubleshoot your models across training, validation, and production. From handling biased actuals to no ground truth, proactive monitoring should be able to handle your model at its worst. Automatically surface feature values that harm your overall model performance, chain together filters to drill down to the why, and find the root cause of your performance degradation. Learn more about Arize’s capabilities here!