August was a busy month, with lots of updates from the engineering team to make agent engineering easier. From previewing examples in the UI to a dedicated agent graph tab to richer charting for experiments, there is a lot to explore.
Here are some highlights on what we shipped.
Experiments & Data Visualization Improvements
The Experiments page has been redesigned with improved UX and richer charting, including new select components and diffing support on the compare page for clearer side-by-side analysis.
Average aggregate metrics are now shown in experiment headers. Usability fixes such as expandable/collapsible tables, editable experiment names, and updated column headers make workflows smoother.
Color maps and diffing functionality have also been improved, and the trace metadata now uses experiment IDs for better consistency. The experiment compare headers also feature a pinned experiment button for easier navigation.

Tracing Updates
Experiment Traces Improvements
Speaking of experiments: the new traces page slide-over enhances the experiment tracing experience, with hover buttons now always visible, experiment traces added to the overflow menu, and search functionality added to the Experiments List Page.

Dedicated Agent Graph Tab
The tracing interface also now includes a dedicated Agent Graph tab, making it clearer to visualize and explore agent interactions within traces.

Trace Interactivity Improvements
Hover states have been added for trace costs, and spans in traces are now clickable—making it easier to explore cost details and navigate through trace data.

Playground Updates
Performance Updates
Playground data loading has been improved to boost reliability and performance. Fixed missing metrics displays for AWS Bedrock models, ensuring smoother and more consistent evaluation workflows.
Day Zero Support for GPT-5
Prompt Playground supports GPT-5, giving users access to the latest OpenAI model for experimentation and evaluation.
Datasets Upgrades
Dataset Filtering & REST API Updates
Datasets now support improved filtering capabilities, better column organization following semantic conventions, and expanded REST API coverage for listing datasets and examples.
Text areas on dataset example pages can now expand to full column width, and dataset filter history is now preserved.
Dataset Management Upgrades
The Datasets interface has been improved with CSV upload fixes, search capabilities on the Datasets List Page, and REST API support for dataset deletion.
Image Support for Datasets and Labeling Queues
Datasets now support images in both datasets and labeling queues, with updated column groupings, clearer example ID displays, and reference tokens in headers. This release also introduces download tooltips for datasets and experiments, making it easier to export data directly from the UI.
Other Updates
Expanded Annotation Configuration Capabilities
Annotations now support up to five labels per configuration, giving teams more flexibility to capture nuanced judgments and tailor evaluation workflows to their needs.
This release also adds improved validations, clearer table views, and multiple UI and labeling queue enhancements for a smoother annotation workflow.
Alyx Copilot API Advancements
Copilot API now supports structured output, improved frontend message parsing, and streamlined post-processing workflows, delivering a major upgrade to the AI assistant architecture.
Project UX Enhancements
The Projects page now features improved navigation and usability, with the addition of Tasks Provider to simplify task and evaluation management.
Revamped Eval and Tasks Experience
The Evals experience has been upgraded with a redesigned Tasks page and updated slideovers that for a cleaner workflow. A save button has been added to Evals slideovers, counters in Datasets now stay up to date, and evaluators automatically refresh from datasets.
New to Arize AX? Sign up for a free account or book some time with us for a personalized walk-through of the platform.