An overview of machine learning model monitoring, why it’s important, how it relates to machine learning observability, and what to look for in a model monitoring solution.
What is Model Monitoring?
Machine learning (ML) model monitoring is a series of techniques deployed to measure key model performance metrics better and understand when issues arise in machine learning models. Areas of focus include model drift, model performance, and data quality. Model monitoring empowers ML Engineers to quickly detect issues and pinpoint where to begin further analysis for quick RCA.
Model explainability is an additional layer to model monitoring to uncover feature importance in a model and evaluate the why behind specific predictions.
Why Do You Need Machine Learning Monitoring in Your ML Toolbox?
Why is it important to quickly detect when issues emerge with a model? Well, that’s simple. It all boils down to time and money. Poorly performing models can negatively impact your customer’s experiences, reduce revenue, start PR disasters, perpetuate systemic bias, and much more. The quicker you are able to surface issues, the faster you can drill down to resolve them.
Where Did Model Monitoring Come From?
Machine learning models in production do not come without issues. While research environments can be controlled, when served – models quickly become far more complex when met with the real world. Some challenges include constant data changes, rewriting the model in a new language, or pushing features into a store. Any minute change can jeopardize a model. Yet as it stands, most ML teams don’t know what’s wrong with their models until it’s too late.
Why is Model Monitoring Important?
Models can gradually decay over time, digest outliers, over/under-index for specific cohorts, and significantly change business impact metrics.
TL;DR Models Encounter Challenges in Production Environments
Despite the multitude of problems a model can undergo, most ML teams of all sizes lack specific tooling to better understand their model’s performance in production and only know something’s wrong after a series of customer complaints or a sneaking suspicion. While some organizations can afford large teams to help uncover pointed issues, engineers still lack a streamlined solution to act as a guardrail on deployed ML models.
So, what now? Well, in comes model monitoring. With the proper monitoring tool, ML engineers should be able to automatically surface issues with proactive monitoring to understand your change in distribution over time better, quickly map fast actuals efficiently and intelligently across all model environments, come up with proxy metrics to estimate actuals if nonexistent, and go beyond performance metrics at a global, cohort, and local level to holistically address model performance.
The Three Components of Model Monitoring:
Let’s break this down a little bit more. Below, you’ll find the three main components of model monitoring and what to look for when evaluating if a monitoring tool is up to par.
Performance analysis monitoring Daily or hourly checks on model performance such as accuracy above 80%, RMSE, accuracy above trainingJump to section
Drift monitoring Distribution comparisons, numeric or categorical on features, predictions, and actualsJump to section
Data quality monitoring Real-time data checks w/ features, predictions, and actualsJump to section
How Does Model Monitoring Relate to ML Observability?
We’ve said it before, and we’ll repeat it: taking a model from research to production is hard. There are many failure modes a model can encounter at any given time, and monitoring your model’s performance in production is the first step to gaining confidence in your models as you move to online environments. Machine learning monitoring is a component of ML observability, a tool used to monitor, troubleshoot, and explain your models in production to drill down to the why behind the decisions your models make in production.
ML observability interacts with every part of the model building lifecycle. From experimentation to production, observability helps troubleshoot problems with data ingestion, training/validation, and serving workflows. While bringing a model online can be challenging, ML observability helps alleviate the burden of troubleshooting issues during the model building and serving process to ensure the highest quality model is served in production.
ML Monitoring Resources
Model Monitoring for Performance Analysis
Model performance analysis should move beyond the one-dimensional view of accuracy. While accuracy is an important metric, it often does not tell the whole story, and worse, can mask underlying issues that affect model performance.
How To Monitor ML Model Performance:
To truly surface up where a model is underperforming, it’s important to look at the performance of models across various cohorts and slices of predictions. Performance analysis metrics include: MAPE or MAE, Recall, Precision, F1 used based on model type.
What to Look for When Monitoring for Performance Analysis:
Look for monitoring solutions with performance metrics at a global and cohort level to automatically surface lower performing segments at any evaluation dimension, granularity down to the hourly level, monitor specific slices of predictions, detect when false positives or negatives surpass a particular threshold, create proxy metrics when actuals aren’t available, and indicates when your model needs to be retrained.
Performance Analysis Resources
Model Monitoring for Drift
Drift is a change in distribution over time, measured for model inputs, outputs, and actuals of a model.
While models naturally gradually decay over time, there are also hard failures associated with data drift and model drift that can negatively impact a model’s overall performance. Models aren’t stagnant, and it’s important to ensure your model is relevant. Measure drift to identify if your models have grown stale, you have data quality issues, or if there are adversarial inputs in your model.
How To Monitor For Model Drift:
There are a few types of drift to monitor in your production model. Monitor for prediction drift, concept drift, data drift, and upstream drift to ensure high performing models in production. Quantify drift using a few key analysis techniques: PSI, K-L Divergence, Wasserstein’s Distance, and more.
What to Look For:
With proactive monitoring, detecting drift should be easy with automatic alerts. Bulk create monitors with multiple baselines, view feature performance at a glance, access a historical view of your drift, and access the distribution view associated with your PSI measure.
Model Monitoring For Data Quality
Machine learning models are not static. They are trained on data and highly depend on data to make reliable predictions. It’s important to immediately surface data quality issues to identify how your data quality maps to your model’s performance.
How To Monitor For Data Quality:
As data flows through the inputs of a model, monitoring is used to ensure the highest possible quality data is ingested and digested to prevent model performance degradation. Utilize data quality monitoring to analyze hard failures in your data quality pipeline.
Monitor for cardinality shifts, missing data, data type mismatch, out-of-range, and more to get down to root cause analysis of performance issues
What to Look For When Monitoring For Data Quality:
In an effective monitoring system, data quality monitors should look upstream and calibrate to exact parameters for any model, any version, and any dimension to quickly catch where features, predictions, and actuals don’t conform with expected ranges.
What does Model Monitoring Look Like In Practice?
Leverage the power of model monitoring within ML observability to improve overall model visibility. Monitor your models to maximize revenue, improve productivity, and increase trust in your ML models. Troubleshoot common use case-specific problems using Arize, the leading ML observability platform.
Common ML Monitoring Use Cases:
Learn how to set up proactive monitors for chargebacks (false negative rate) and false positive transactions for your credit card fraud model.
Troubleshoot bad data quality, drifting features, and low performing cohorts of your ad click-through rate model.
Analyze your recommendation engine model’s performance across various slices and dive into which features could cause performance degradation.
The Who, What, Where, When, Why (and How) of Recommender SystemsRead more →
How ML Observability Helps America First Credit Union Stay a Step AheadRead more →
Best Practices for ML Monitoring and Observability of Demand Forecasting ModelsRead more →
Best Practices In ML Observability for Click-Through Rate ModelsRead more →
Best Practices In ML Observability for Customer Lifetime Value (LTV) ModelsRead more →
Best Practices In ML Observability for Monitoring, Mitigating and Preventing FraudRead more →
Model Monitoring from Arize
With the right monitoring tool, you should easily create dashboards to deeply analyze and troubleshoot your models across training, validation, and production. From handling biased actuals to no ground truth, proactive monitoring should be able to handle your model at its worst. Automatically surface feature values that harm your overall model performance, chain together filters to drill down to the why, and find the root cause of your performance degradation.