July 2026 - Arize AX Docs

Evaluate agents with new prebuilt trajectory and session evaluators

July 1, 2026 New Evaluators The evaluator gallery now includes seven new LLM-as-a-judge templates for agent trajectories and multi-turn sessions, so you can measure agent goal completion, path efficiency, and session quality without writing evaluation prompts from scratch.

New agent and session evaluators — Goal Completion, Path Efficiency, Reasoning Coherence, Session Resolution, Topic Coherence, Session Frustration, and Session Completion.
Organized by workflow — templates are grouped into Response Quality, Code Quality, Trajectory, Session, RAG, and Security so you can find the right evaluator faster.
Scope at a glance — each template shows whether it runs on a span, trace, or session.
Security evaluators for everyone — the Security template group is now available to all users.

Learn more about creating evaluators.

Capture more LLM and tool detail with new OpenInference span attributes

July 1, 2026 New Tracing and Sessions Arize now stores four additional OpenInference span attributes as typed, queryable columns, so you can filter and analyze traces on richer LLM, tool, and embedding metadata.

llm.system — the system or provider that served the LLM call.
llm.finish_reason — why the model stopped generating.
tool.id — the tool-call identifier, now available as a top-level attribute.
embedding.invocation_parameters — the parameters used for embedding calls.

Learn more about OpenInference semantic conventions.

Run prompts and evaluations on Claude Sonnet 5

June 30, 2026 New Models and Integrations Claude Sonnet 5 (native Anthropic) is now available across the Prompt Playground and LLM-as-a-judge evaluators, so you can test and evaluate your prompts on Anthropic’s latest Sonnet-tier model without leaving Arize.

Available everywhere you pick a model — select Claude Sonnet 5 in the Prompt Playground and when configuring evaluators.
Automatic cost tracking — input, output, and cache token costs are recorded for every call, so spend shows up in your usage and evaluation costs.
Adaptive reasoning — like other latest Claude models, Sonnet 5 manages its own reasoning effort, so the temperature, top-p, and top-k controls no longer apply.

Learn more about the Prompt Playground.

Map evaluator variables to dataset columns in the Prompt Playground

June 29, 2026 New Playground You can now map evaluator template variables to differently-named dataset columns directly in the Prompt Playground, and Alyx can fill in those mappings for you, so you can run an evaluator over any dataset without rewriting its template to match your column names.

Per-instance variable mappings — a Mappings control on each evaluator instance maps template variables such as {{input}} to dataset columns such as question, and warns you when a mapping is missing.
Automatic mapping with Alyx — the Align Eval flow detects column mismatches and applies the correct mappings before running.
Preview with resolved values — the preview table shows each variable → column mapping alongside the resolved values.

Learn more about aligning evals to human feedback.

Launch prebuilt agents with a guided setup wizard

June 27, 2026 New Agents Agent Studio now includes a guided setup wizard that walks you through connecting, configuring, and launching a ready-made agent, so you can spin up a purpose-built agent, such as an on-call SRE or a failing-trace investigator, without assembling skills, projects, and tasks by hand.

Ready-made agents — start from templates like SRE/On-Call, Investigate Failing Traces, Fix a Bug, Incident Commander, and Cost.
Guided Connect, Configure, and Review — connect skills, point the agent at a project or repository, and launch it as a session or a recurring automation.

Learn more about Agent Studio.

Analyze experiment trends with the new Experiment Analysis view

June 25, 2026 New Datasets and Experiments The redesigned Experiment Analysis view, a refined trend chart paired with a Scoreboard, is now the default charting experience on the dataset Experiments page, so you can spot performance changes across experiment runs at a glance instead of reading a flat summary line chart.

Trend chart — track how scores and metrics change across experiment runs over time.
Scoreboard — compare the key metrics for your selected experiments side by side.

Learn more about comparing experiments.

Fixes and improvements

June 25–July 1, 2026 Evaluators

Improvement You can now start an evaluator run on a dataset directly from the slideover in a single step.

Tracing and Sessions

Improvement Your span Input/Output and Attributes display format (Pretty, Raw, JSON) now persists across sessions.
Improvement The Projects flyout now lists recently viewed projects with a 7-day volume sparkline, and prompt tools appear as top-level navigation items.
Improvement Added a Clear Filters action to the tracing table empty state so you can recover from a filtered-empty view in one click.

Agents

Improvement Automation detail slide-overs now show the agent’s investigation report, with clickable references to the traces it examined.

​Evaluate agents with new prebuilt trajectory and session evaluators

​Capture more LLM and tool detail with new OpenInference span attributes

​Run prompts and evaluations on Claude Sonnet 5

​Map evaluator variables to dataset columns in the Prompt Playground

​Launch prebuilt agents with a guided setup wizard

​Analyze experiment trends with the new Experiment Analysis view

​Fixes and improvements