marquee background

Model Monitoring

Learn what model monitoring is, why it’s important, how it relates to machine learning observability, and how an ML model monitoring platform works.

What is Model Monitoring?

Monitoring across ground truth, outputs, explainability, and inputs

Machine learning (ML) model monitoring is a series of techniques deployed to measure key performance metrics and understand when issues arise in machine learning models. Areas of focus include model drift, performance, and data quality. Better model monitoring empowers ML Engineers to quickly detect issues and pinpoint where to begin further analysis at a cohort-level for quick root cause analysis.

Model explainability is an additional layer to uncover feature importance in a model and evaluate the why behind specific predictions.

Why Do You Need Model Monitoring?

Why is it important to quickly detect when issues emerge with a model? Well, that’s simple. It all boils down to time and money. Poorly performing models can negatively impact your customer’s experiences, reduce revenue, start PR disasters, perpetuate systemic bias, and much more. The quicker you are able to surface issues, the faster you can drill down to resolve them.

Where Did Model Monitoring Come From?

While research environments can be controlled, models quickly become far more complex when met with the real world. Some challenges include constant data changes, rewriting the model in a new language, training-serving skew, or pushing features into a store. Any minute change can jeopardize a model. Yet as it stands, most ML teams don’t know what’s wrong with their models until it’s too late.

Why is Model Monitoring Important?

Models can gradually decay over time, digest outliers, over/under-index for specific cohorts, and significantly change business impact metrics.

TL;DR Models Encounter Challenges in Production Environments

Despite the multitude of problems a model can undergo, most ML teams of all sizes lack specific tooling to better understand their model’s performance in production and only know something’s wrong after a series of customer complaints or a sneaking suspicion. While some organizations can afford large teams to help uncover pointed issues, engineers still lack a streamlined solution to act as a guardrail on deployed ML models.

With the proper monitoring tool, ML engineers should be able to automatically surface issues to understand your change in distribution over time better, quickly map fast actuals efficiently and intelligently across all model environments, come up with proxy metrics to estimate actuals if nonexistent, and go beyond performance metrics at a global, cohort, and local level to holistically address model performance.

The Three Components of Model Monitoring:

Let’s break this down a little bit more. Below, you’ll find the three main components of model monitoring and what to look for when evaluating if a monitoring tool is up to par.

Performance analysis monitoring Daily or hourly checks on model performance such as accuracy above 80%, RMSE, accuracy above training

Jump to section
background gradient

Drift monitoring Distribution comparisons, numeric or categorical on features, predictions, and actuals

Jump to section
background gradient

Data quality monitoring Real-time data checks w/ features, predictions, and actuals

Jump to section
background gradient

How Does Model Monitoring Relate to ML Observability?

We’ve said it before, and we’ll repeat it: taking a model from research to production is hard. There are many failure modes a model can encounter at any given time, and monitoring your model’s performance in production is the first step to gaining confidence in your models as you move to online environments. Machine learning monitoring is a component of ML observability, a tool used to monitor, troubleshoot, and explain your models in production to drill down to the why behind the decisions your models make in production.

ML observability interacts with every part of the model building lifecycle. From experimentation to production, observability helps troubleshoot problems with data ingestion, training/validation, and serving workflows. While bringing a model online can be challenging, ML observability helps alleviate the burden of troubleshooting issues during the model building and serving process to ensure the highest quality model is served in production.

Observvability across the ML model lifecycle
ML Monitoring Resources

Why Best-Of-Breed ML Monitoring and Observability Solutions Are The Way Forward

Read more →

Beyond Monitoring: The Rise of Observability

Read more →

The Definitive Machine Learning Observability Checklist

Read more →

Machine Learning Observability 101

Read more →

Model Monitoring for Performance Analysis

Model performance analysis should move beyond the one-dimensional view of accuracy. While accuracy is an important metric, it often does not tell the whole story, and worse, can mask underlying issues that affect model performance.

How To Monitor ML Model Performance:

To truly surface up where a model is underperforming, it’s important to look at the performance of models across various cohorts and slices of predictions. Performance analysis metrics include: MAPE or MAE, Recall, Precision, F1 used based on model type.

What to Look for When Monitoring for Performance Analysis:
Look for monitoring solutions with performance metrics at a global and cohort level to automatically surface lower performing segments at any evaluation dimension, granularity down to the hourly level, monitor specific slices of predictions, detect when false positives or negatives surpass a particular threshold, create proxy metrics when actuals aren’t available, and indicates when your model needs to be retrained.

Model Performance Analysis

Drift Monitoring

Drift is a change in distribution over time, measured for model inputs, outputs, and actuals of a model.

While models naturally gradually decay over time, there are also hard failures associated with data drift and model drift that can negatively impact a model’s overall performance. Models aren’t stagnant, and it’s important to ensure your model is relevant. Measure drift to identify if your models have grown stale, you have data quality issues, or if there are adversarial inputs in your model.

How To Monitor For Model Drift:

There are a few  types of drift to monitor in your production model. Monitor for prediction drift, concept drift, data drift, and upstream drift to ensure high performing models in production. Quantify drift using a few key analysis techniques: PSI, K-L Divergence, Wasserstein’s Distance, and more.

What to Look For:
With proactive monitoring, detecting drift should be easy with automatic alerts. Bulk create monitors with multiple baselines, view feature performance at a glance, access a historical view of your drift, and access the distribution view associated with your PSI measure.

Ground Truth
arrow icon
Model Drift/
Concept Drift
Model Drift/
Concept Drift
arrow model
model box
Model Drift/
Concept Drift
Data Drift/
Feature Drift

Data Quality Monitoring

Machine learning models are not static. They are trained on data and highly depend on data to make reliable predictions. It’s important to immediately surface data quality issues to identify how your data quality maps to your model’s performance.

How To Monitor For Data Quality:

As data flows through the inputs of a model, monitoring is used to ensure the highest possible quality data is ingested and digested to prevent model performance degradation. Utilize data quality monitoring to analyze hard failures in your data quality pipeline.

Monitor for cardinality shifts, missing data, data type mismatch, out-of-range, and more to get down to root cause analysis of performance issues

What to Look For When Monitoring For Data Quality:
In an effective monitoring system, data quality monitors should look upstream and calibrate to exact parameters for any model, any version, and any dimension to quickly catch where features, predictions, and actuals don’t conform with expected ranges.

Data Pipeline and Quality Issues In ML Lifecycle

What Does This Look Like In Practice?

Improve overall model visibility by monitoring to maximize revenue, improve productivity, and increase trust in your ML models. Troubleshoot common use case-specific problems using Arize, the leading ML observability platform.

Common ML Monitoring Use Cases:

Learn how to set up proactive monitors for chargebacks (false negative rate) and false positive transactions for your credit card fraud model.

Optimize Fraud Model Evaluation Metrics →

Troubleshoot bad data quality, drifting features, and low performing cohorts of your ad click-through rate model.

Improve CTR Model Performance →

Analyze your recommendation engine model’s performance across various slices and dive into which features could cause performance degradation.

Enhance Recommendation System Model Performance →

And More Use Cases! →

Monitor Across Different Serving Options

Easily integrate with any serving option to simplify the production process

Model Monitoring from Arize

With the right monitoring tool, you should easily create dashboards to deeply analyze and troubleshoot your models across training, validation, and production. From handling biased actuals to no ground truth, proactive monitoring should be able to handle your model at its worst. Automatically surface feature values that harm your overall model performance, chain together filters to drill down to the why, and find the root cause of your performance degradation.

Learn More About Arize’s Model Monitoring Capabilities →

Ready to level up your ML observability game?

Request a Trial