Machine Learning Observability

Your one-stop shop for all things ML observability-related. An overview of ML observability fundamentals: the 4 pillars of ML observability, its implementation in the ML toolchain, and common ML observability techniques.

What is ML Observability?

ML Observability is a tool used to monitor, troubleshoot, and explain machine learning models as they move from research to production environments. An effective observability tool should not only automatically surface issues, but drill down to the root cause of your ML problems and act as a guardrail for models in production.

4 Pillars of ML Observability

ML Observability in practice

Performance Analysis Surfacing worst performing slices

Jump to section →

Drift Data distribution changes over lifetime of model

Jump to section →

Data Quality Ensure high quality inputs and outputs

Jump to section →

Explainability Attribute why a certain outcome was made

Jump to section →

Performance Analysis

ML observability enables fast actionable performance information on models deployed in production. While performance analysis techniques vary on a case-by-case basis depending on model type and its use case in the real world, common metrics include: Accuracy, Recall, F-1, MAE, RMSE, and Precision. Performance analysis in an ML observability system ensures that performance has not degraded drastically from when it was trained or when it was initially promoted to production.

Modern Model Performance Management

Two Essentials for ML Service-Level Performance Monitoring

The Playbook to Monitor Your Model’s Performance in Production

Introducing ML Performance Tracing

Towards Better Analysis of Machine Learning Models

Model Performance Measures

Drift

ML observability encompasses drift to monitor for a change in distribution over time, measured for model inputs, outputs, and actuals of a model. Measure drift to identify if your models have grown stale, you have data quality issues, or if there are adversarial inputs in your model. Detecting drift in your models will help protect your models from performance degradation and allow you to better understand how to begin resolution.

Using Stastical Distance Metrics in Machine Learning

Take My Drift Away

The Model’s Shipped; What Could Possibly go Wrong?

A Guide To Different Types of Drift

Detection of Data Drift and Outliers Affecting ML Models Performance Over Time

Model Drift 101

Data Quality

Data quality checks in an ML observability system identify hard failures within data pipelines between training and production that can negatively impact a model’s end performance. Data quality includes monitoring for cardinality shifts, missing data, data type mismatch, out-of-range, and more to better gauge model performance issues and ease RCA.

Bracing Yourself for a World of Data-Centric AI

Challenges In Monitoring Production ML Pipelines

Solving Data Quality with ML Observability and Data Operations

A Quick Start To Data Quality Monitoring for ML

The Effect of Data Quality on ML Models

The Challenges of Data Quality and Data Quality Assessments

Explainability

Explainability in ML observability uncovers feature importance across training, validation, and production environments which provides the ability to introspect and understand why a model made a particular prediction. Explainability is commonly achieved by calculating metrics such as SHAP and LIME to build confidence and continuously improve machine-learned models.

What Are Global, Cohort, and Local Model Explainability?

What Are the Prevailing Explainability Methods and Where Should You Use Them?

The Only 3 ML Tools You Need

A Survey on Explainable Artificial Intelligence

Using Model Explainability With Arize

Stochastic Backpropagation and Approximate Inference in Deep Generative Models