- Store training, validation, and production datasets (features, predictions, actuals)
- Store performance metrics for each model version across environments
- Use any dataset as a baseline reference for monitoring production performance
Taking a model from research to production is hard
Gaps in machine learning observability create a fundamental disconnect between data science and ML engineering teams.
With minimal telemetry, models are often deployed into production with key questions unanswered:
Will the model work?
Pre-launch validation of model readiness and data checks
Is the model working in production?
Auto-monitoring to find the needle in the haystack
What’s wrong with the model, why?
Root cause analysis and model explainability tools
An ML observability solution for continuous model improvement
The ability to surface unknown issues and diagnose the root cause is what differentiates machine learning observability from traditional monitoring tools. By connecting datasets across your training, validation, and production environments in a central evaluation store, Arize enables ML teams to quickly detect where issues emerge and deeply troubleshoot the reasons behind them.
Explore the benefits of an evaluation store:
- Integrates with your feature store to track feature drift and data quality
- Integrates with your model store for a historical record of performance by model lineage
- Allows comparison of any production activity to any other model evaluation dataset (e.g. Test Set, Extreme Validation Set)
Monitoring & Data Checks
- Automatically detects drift, data quality issues, or anomalous performance degradations
- Highly configurable monitors based on both common KPIs and custom metrics
- Provides a centralized view of how a model acts on data for governance and repeatability
- Validates data distribution for extreme inputs/outputs, out of range, % empty, and other common quality issues
- Compares model performance across training, validation, and production environments
- Provides experimentation capabilities to test model versions
- Enables deep analysis and troubleshooting with slice & dice functionality
- Uncovers underperforming cohorts of predictions
- Leverages SHAP values to expose feature importance
- Helps you understand when it’s time to retrain a model