Evaluate agents with new prebuilt trajectory and session evaluators
July 1, 2026 New Evaluators The evaluator gallery now includes seven new LLM-as-a-judge templates for agent trajectories and multi-turn sessions, so you can measure agent goal completion, path efficiency, and session quality without writing evaluation prompts from scratch.- New agent and session evaluators — Goal Completion, Path Efficiency, Reasoning Coherence, Session Resolution, Topic Coherence, Session Frustration, and Session Completion.
- Organized by workflow — templates are grouped into Response Quality, Code Quality, Trajectory, Session, RAG, and Security so you can find the right evaluator faster.
- Scope at a glance — each template shows whether it runs on a span, trace, or session.
- Security evaluators for everyone — the Security template group is now available to all users.
Capture more LLM and tool detail with new OpenInference span attributes
July 1, 2026 New Tracing and Sessions Arize now stores four additional OpenInference span attributes as typed, queryable columns, so you can filter and analyze traces on richer LLM, tool, and embedding metadata.llm.system— the system or provider that served the LLM call.llm.finish_reason— why the model stopped generating.tool.id— the tool-call identifier, now available as a top-level attribute.embedding.invocation_parameters— the parameters used for embedding calls.
Run prompts and evaluations on Claude Sonnet 5
June 30, 2026 New Models and Integrations Claude Sonnet 5 (native Anthropic) is now available across the Prompt Playground and LLM-as-a-judge evaluators, so you can test and evaluate your prompts on Anthropic’s latest Sonnet-tier model without leaving Arize.- Available everywhere you pick a model — select Claude Sonnet 5 in the Prompt Playground and when configuring evaluators.
- Automatic cost tracking — input, output, and cache token costs are recorded for every call, so spend shows up in your usage and evaluation costs.
- Adaptive reasoning — like other latest Claude models, Sonnet 5 manages its own reasoning effort, so the temperature, top-p, and top-k controls no longer apply.
Map evaluator variables to dataset columns in the Prompt Playground
June 29, 2026 New Playground You can now map evaluator template variables to differently-named dataset columns directly in the Prompt Playground, and Alyx can fill in those mappings for you, so you can run an evaluator over any dataset without rewriting its template to match your column names.- Per-instance variable mappings — a Mappings control on each evaluator instance maps template variables such as
{{input}}to dataset columns such asquestion, and warns you when a mapping is missing. - Automatic mapping with Alyx — the Align Eval flow detects column mismatches and applies the correct mappings before running.
- Preview with resolved values — the preview table shows each
variable → columnmapping alongside the resolved values.
Launch prebuilt agents with a guided setup wizard
June 27, 2026 New Agents Agent Studio now includes a guided setup wizard that walks you through connecting, configuring, and launching a ready-made agent, so you can spin up a purpose-built agent, such as an on-call SRE or a failing-trace investigator, without assembling skills, projects, and tasks by hand.- Ready-made agents — start from templates like SRE/On-Call, Investigate Failing Traces, Fix a Bug, Incident Commander, and Cost.
- Guided Connect, Configure, and Review — connect skills, point the agent at a project or repository, and launch it as a session or a recurring automation.
Analyze experiment trends with the new Experiment Analysis view
June 25, 2026 New Datasets and Experiments The redesigned Experiment Analysis view, a refined trend chart paired with a Scoreboard, is now the default charting experience on the dataset Experiments page, so you can spot performance changes across experiment runs at a glance instead of reading a flat summary line chart.- Trend chart — track how scores and metrics change across experiment runs over time.
- Scoreboard — compare the key metrics for your selected experiments side by side.
Fixes and improvements
June 25–July 1, 2026 Evaluators- Improvement You can now start an evaluator run on a dataset directly from the slideover in a single step.
- Improvement Your span Input/Output and Attributes display format (Pretty, Raw, JSON) now persists across sessions.
- Improvement The Projects flyout now lists recently viewed projects with a 7-day volume sparkline, and prompt tools appear as top-level navigation items.
- Improvement Added a Clear Filters action to the tracing table empty state so you can recover from a filtered-empty view in one click.
- Improvement Automation detail slide-overs now show the agent’s investigation report, with clickable references to the traces it examined.