Arize AI Debuts Monitoring for Unstructured Data

BERKELEY, Calif., June 30, 2022 — Arize AI, the leader in machine learning (ML) observability, debuted a groundbreaking product for embedding drift monitoring and embedding analysis today.

According to multiple estimates, 80% of data generated is unstructured audio, images, text, or video (as opposed to structured data like rows of dates, numbers, and addresses). Machine learning teams are putting this data to great use, with computer vision and natural language processing (NLP) models powering everything from self-driving cars to classifying long legal documents. Despite a decade of investment in deep learning, however, there has not been a great way to monitor these models as performance shifts in production – until now.

Now available as part of Arize’s free subscription tier, Arize for embedding analysis enables users to log models with both structured and unstructured data to Arize for monitoring. By monitoring embeddings of their unstructured data, teams can proactively identify when their unstructured data is drifting. Troubleshooting is simplified with interactive visualizations to help isolate new or emerging patterns, underlying data changes, and data quality issues.

This update is designed to tackle several common pain points of working with deep learning models: 

  • ML teams often lack visibility into what’s happening to the data when an unstructured data model is put into production. With no monitoring for drift or performance, picking up on upstream data quality issues or change in the data is practically impossible.
  • Deep learning models are expensive to train. Since labeling is expensive, ML teams often only label as much as 0.1% of their data. When models are then put into production, it often results in new patterns emerging that the model hadn’t encountered in training. Gone unnoticed, these new patterns lead to performance degradation.

Arize’s interactive UMAP implementation with both 2D and 3D views enables teams to quickly visualize their high dimensional data in a low dimensional space. By visualizing drift between embeddings with production data layered on top of training data, teams are able to see groupings of embeddings and easily identify patterns or data that were not present in training.

“It has been an amazing journey over the last year scaling Arize to track hundreds of billions predictions a month, and we have learned a lot. We plowed many of those insights into an architecture for embedding analysis that is the first of its kind,” says Jason Lopatecki, CEO and Co-Founder of Arize. “Most teams are shipping deep learning models blind today, and this product is built to help change that.”

Aparna Dhinakaran, Arize’s Co-Founder and Chief Product Officer, concurs: “Cutting edge deep learning still relies on human labeling teams looking at around 1% of the data to help train models and – hopefully – capture what will happen in the real world. Better monitoring is needed to surface issues with models in production. Arize’s new capabilities for monitoring unstructured data promise to change the game for ML teams.”

About Arize AI
Arize AI is a machine learning observability platform that helps ML practitioners successfully take models from research to production with ease. Arize’s automated model monitoring and analytics platform helps ML teams quickly detect issues when they emerge, troubleshoot why they happened, and improve overall model performance. By connecting offline training and validation datasets to online production data in a central inference store, ML teams can streamline model validation, drift detection, data quality checks, and model performance management.

Arize AI acts as the guardrail on deployed AI, providing transparency and introspection into historically black box systems to ensure more effective and responsible AI. To learn more about Arize or machine learning observability and monitoring, visit our blog and resource hub.

Subscribe to our resources and blogs