Arize Release Notes: Copilot Enhancements, Experiment Projects, and More

Sarah Welsh

Contributor

Welcome to our regular update on new releases, enhancements, and changes.

What’s New

Copilot Enhancements

Span Chat

The Copilot Span Chat skill makes getting value from spans faster and easier. Rather than spending time scrolling through and deciphering span data , teams can now:

  • Analyze spans to extract key insights
  • Ask questions to quickly understand span data
  • Run evaluations on individual spans
spanchat eval
Span Chat Evaluation

Dashboard Widget Generator

Building dashboard plots just got way easier. The dashboard skill lets teams…

  • Create time series plots or distributions from natural language
  • Translate code (like Plotly) into ready-to-go visualizations
  • Handle ambiguous filters like “west coast states” and plot multiple widgets at once
Dashboard generator screenshot
Dashboard generator

Misc. Copilot Updates

  • We’ve revamped the main chat experience to be always accessible on the page, with an option to collapse the input bar
  • The Custom Metric skill now supports a conversational flow, making it easier for users to iterate and refine metrics dynamically

Additional Enhancements

Experiment Projects

Experiment traces for a dataset are now consolidated and can be accessed under “Experiment Projects” on the “Projects & Models” page.

experiment projects screenshot
Experiment projects

Multi-Class/Label Per-Class Calibration & Chart

We’ve just rolled out per-class calibration metrics and calibration chart. Users can see calibration scores for each class separately and view the calibration chart all in one place.

  • To view per-class calibration simply select calibration from the metric dropdown and choose a class
Per-class calibration screenshot
Per-class calibration
  • The calibration chart can be found under the “More Charts” tab
Calibration chart
Calibration chart

SDK Version 7.29.0

  • Log experiments from a previously created dataframe

📚 New Content

The latest video tutorials, paper readings, ebooks, self-guided learning modules, and technical posts:

🧑‍⚖️ Agent-as-a-Judge: Evaluate Agents with Agents
🤖 LLM-as-a-Judge Evaluation for GenAI Use-Cases
🌎 Building an AI Agent that Thrives in the Real World
🛠️ AI Agent Workflows and Architectures Masterclass
🔬 Agents in the Wild: Geotab