Release Notes
See the latest new features released in Arize
Cost Tracking
July 1, 2025
You can now monitor model spend directly in Arize with native cost tracking. Supporting 60+ models and providers out of the box, this flexible feature adapts to various cost structures and team needs, making it easy to track and manage your AI spend in-platform.
Arize Database (ADB)
June 25, 2025
We’re excited to introduce Arize Database (ADB), the powerful engine behind all Arize AX instances. Built for massive scale and speed, ADB processes billions of traces and petabytes of data with high efficiency.
Its robust architecture supports real-time ingestion, bulk updates, and fast querying, powering even the heaviest AI workloads reliably. ADB has long been the unsung hero of our platform, and we’re proud to bring it to light.
Playground Views
June 25, 2025
The new Prompt Playground lets you save views including prompts, dataset selections, comparison views, messages, and model selections. You can iterate and test variations seamlessly in one environment and share optimal views with your team to accelerate prompt development and evaluation.
Prompt Learning
June 25, 2025
We’re excited to launch Prompt Learning, a new workflow in Arize to accelerate prompt iteration and evaluation. With Prompt Learning, you can:
Run prompt optimization experiments directly in Arize
Incorporate text-based judgments from humans and LLMs
Tune and compare prompt variants to systematically improve agent behavior
Agent Trajectory Evaluations
June 25, 2025
With Agent Trajectory Evaluation you can assesses the sequence of tool calls and reasoning steps your agent takes to solve a task. Key benefits:
Path Quality: See if your agent is following expected, efficient problem-solving paths.
Tool Usage Insights: Detect redundant, inefficient, or incorrect tool call patterns.
Debugging Visibility: Understand internal decision-making to resolve unexpected behaviors, even when outcomes appear correct.
More on Agent Trajectory Evaluations.
Session-level Evaluations
June 25, 2025
You can now evaluate your agents across entire sessions with new session-level evaluations, enabling deeper insight beyond trace-level metrics. Assess:
Coherence: Does the agent maintain logical consistency throughout the session?
Context Retention: Is it effectively remembering and building on prior exchanges?
Goal Achievement: Does the conversation accomplish the user’s intended outcome?
Conversational Progression: Is the agent navigating multi-step tasks in a natural, helpful way?
These evaluations help ensure your agents are effective not just at each step, but across the full journey. More information on Session-level Evaluations.
Agent and Multi-Agent Visualization
June 25, 2025
Easily inspect and debug multi-agent workflows with the new Agent Visibility feature. Alongside Traces and Spans, the new Agents tab auto-generates an interactive flowchart showing how agents, tools, and components interact step-by-step. With Agent Visibility, you can:
Visualize agent workflows end-to-end
Debug bottlenecks and errors with clarity
Link agents to traces and spans for deeper insights
Accelerate orchestration iteration and refinement
It works automatically across frameworks like Agno, Autogen, CrewAI, LangGraph, OpenAI Agents, and SmolAgents. More on agent tracing.
Alyx MCP Assistant
June 25, 2025
All Alyx skills are accessible via MCP, allowing seamless integration into your existing workflows. You can leverage the full suite of Alyx debugging and analysis tools wherever you build, without needing to switch contexts.
This means you can debug traces directly from your IDE while building in environments like Cursor, or connect through Claude Code to identify improvement areas. Refer to the video below for setting up Alyx via MCP in Cursor.
Arize Copilot v3: Alyx & Trace Troubleshooting
June 25, 2025
We are excited to introduce Alyx, our major upgrade to our Copilot assistant. You can now drop context anywhere across the app and open copilot with the magic of ctrl+L to instantly pull context for smarter, faster help.
We’re also introducing Trace Troubleshooting — a new Copilot skill that lets you navigate the entire trace to pinpoint issues. Built with O3 under the hood, you can now:
@ specific spans
Use existing span skills for span questions or evals
Let Copilot traverse and diagnose like a pro
Ability to customize the hot key if you don’t want to use
Ctrl + L
New Homepage & Onboarding Experience
June 20, 2025
We’ve just rolled out a revamped onboarding flow to guide first-time users smoothly into either Tracing or Experiments.
Realtime Trace Ingestion for All Arize AX Instances
May 20, 2025
Realtime trace ingestion is now supported across all Arize AX tiers, including the free tier.
Previously, this feature was only available for enterprise AX users and within our open-source platform, Phoenix. It is now fully rolled out to all users of Arize AX.
No configuration changes are required to begin using realtime trace ingestion.
More OpenAI models in prompt playground and tasks
May 11, 2025
We've added support for more OpenAI models in prompt playground and evaluation tasks. Experiment across models and frameworks quickly.

Sleeker display of inputs and outputs on a span
May 9, 2025
We've improved the design of the span page to showcase the functions, inputs, and outputs, to help you debug your traces faster!

Attribute search on traces
May 7, 2025
Now you can filter your span attributes right on the page, no more CMD+F
!

Column selection in prompt playground
May 5, 2025
You can now view all of your prompt variables and dataset values directly in playground!

Latency and token counts in prompt playground
May 2, 2025
We've added latency and token counts to prompt playground runs! Currently supported for OpenAI, with more providers to come!

Major design refresh in Arize AX
We've refreshed Arize AX with polished fonts, spacing, color, and iconography throughout the whole platform.
Custom code evaluators
You can now run your own custom python code evaluators in Arize against your data in a secure environment. Use background tasks to run any custom code, such as URL validations, or keyword match. Learn more

Security audit logs for enterprise customers
Improve your compliance and policy adherence. You can now use audit logs to monitor data access in Arize. Note: This feature is completely opt-in and this tracking is not enabled unless a customer explicitly asks for it. Learn more
Larger dataset runs in prompt playground
We've increased the row limit for datasets in the playground, so you can run prompts in parallel on up to 100 examples.

Evaluations on experiments
You can now create and run evals on your experiments from the UI. Compare performance across different prompt templates, models, or configurations without code. Learn more →

Cancel running background tasks
When running evaluations using background tasks, you can now cancel them mid-flight while observing task logs. Learn more →

Improved UI for functions in prompt playground
We've made it easier to view, test, and validate your tool calls in prompt playground. Learn more →

Compare prompts side by side
Compare the outputs of a new prompt and the original prompt side-by-side. Tweak model parameters and compare results across your datasets. Learn more →

Image segmentation support for CV models
We now support logging image segmentation to Arize. Log your segmentation coordinates and compare your predictions vs. your actuals.

New time selector on your traces
We’ve made it way easier to drill into specific time ranges, with quick presets like "last 15 minutes" and custom shorthand for specific dates and times, such as 10d
,4/1 - 4/6
, 4/1 3:00am
.
Learn more →

Prompt hub python SDK
Access and manage your prompts in code with support for OpenAI and VertexAI. Learn more
pip install "arize[PromptHub]"
View task run history and errors
Get full visbility into your evaluation task runs, including when it ran, what triggered it, and if there were errors. Learn more →

Run evals and tasks over a date range
Easily run your online evaluation tasks over historical data.
Test online evaluation tasks in playground
Quickly debug and refine your prompts used by your online evaluators by loading them prefilled into prompt playground. Learn more →
Select metadata on the sessions page
Dynamically select the fields you want to see in your sessions view.
Labeling queues
Use Arize to annotate your data with 3rd parties. Learn more →

Expand and collapse your traces
You can now collapse rows to see more data at a glance or expand them to view more text.

Schedule your monitors
Schedule for monitors to run hourly, daily, weekly, or monthly.
Improved traces export
Specify which columns of data you'd like to export when exporting data via the ArizeExportClient by specifying columns
.
primary_df = client.export_model_to_df(
columns=['context.span_id', 'attributes.llm.input'] # <---- HERE
space_id='',
model_id='',
environment=Environments.TRACING,
start_time=datetime(2025, 3, 25),
end_time=datetime(2025, 4, 25),
)
Create dataset from CSVs
You can now create datasets through many methods, from traces, code, manually in the UI, or CSV upload. Read more
OTEL tracing Via HTTP
Support for HTTP when sending traces to Arize! See GitHub for more info.
tracer_provider = register(
endpoint="https://otlp.arize.com/v1/traces", # NEW
transport=Transport.HTTP, # NEW
space_id=SPACE_ID,
api_key=API_KEY
project_name="test-project-http",
)
Voice application tracing and evaluation
Audio tracing: Capture, process, and send audio data to Arize and observe your application behavior.
Evaluation: Assess how well your models identify emotional tones like frustration, joy, or neutrality.

Dashboard colors
We’ve added new ways to plot your charts, with custom colors and better UX!

Prompt hub
Manage, iterate, and deploy your prompts in one place. Version control your templates and use them across playground, tasks, and APIs. Read more
Managed code evaluators
Use our pre-built, off-the-shelf evaluators to evaluate spans without requiring requests to an LLM-as-a-Judge. These include Regex matching, JSON validation, Contains keyword, and more!
Create experiments from playground
Quickly experiment with your prompts across your datasets. All you have to do is click "Save as experiment" Read more
Monitor alert status
See exactly how and when your monitors are triggered

LangChain Instrumentation
Support for sessions
via LangChain native thread tracking in TypeScript is now available. Easily track multi-turn conversations / threads using LangChain.js.
Analyze your spans with Copilot
Extract key insights quickly from your spans instead of trying to decipher meaning in hundreds of spans. Ask questions and run evals right in the trace view.

Generate dashboards with Copilot
Building dashboard plots just got way easier. Create time series plots and even translate code into ready to go visualizations.

The Custom Metric skill now supports a conversational flow, making it easier for users to iterate and refine metrics dynamically
View your experiment traces
Experiment traces for a dataset are now consolidated accessed under "Experiment Projects".

Multi-class calibration chart
For your multi-class ML models, you can see how your model is calibrated in one visualization

Log experiments in Python SDK
You can now log experiment data manually using a dataframe, instead of running an experiment. This is useful if you already have the data you need, and re-running the query would be expensive. SDK Reference
arize_client.log_experiment(
space_id=SPACE_ID,
experiment_name="my_experiment",
experiment_df=experiment_run_df,
task_columns=task_columns,
evaluator_columns={"correctness": evaluator_columns},
dataset_name=dataset_name,
)
Create custom metrics with Copilot
Users can generate their desired metric by having copilot translate natural language descriptions or existing code (e.g., SQL, Python) into AQL. Learn more →

Summarize embeddings with Copilot
Copilot now works for embeddings! Users can select embedding data point and Copilot will analyze for patterns and insights. Learn more →

Local explainability support for ML models
Local Explainability is now live, providing both a table view and waterfall style plot for detailed, per-feature SHAP values on individual predictions. Learn more →

See experiment results over time
Visualize specific evaluations over time in dashboards. Learn more →

Function calling replay in prompt playground
Now users can follow the full function calling tutorial from OpenAI and iterate on different functions in different messages from within the Prompt Playground.

Vercel AI auto-instrumentation
User can now ingest traces created by the Vercel AI SDK into Arize. Learn more →
Track sessions and context attributes in instrumentation
You can add metadata and context that will be picked up by all of our auto instrumentations and added to spans. Learn more →
Easily test your online tasks and evals
Users now have the option to to test a task, such as online eval, by running it once on existing data, or apply evaluation labels to older traces. Learn more →

Experiment filters
Users can now filter experiments based on dataset attributes or experiment results, making it easy to identify areas for improvement and track their experiment progress with more precision. Learn more →

Embedding traces
With Embeddings Tracing, you can effortlessly select embedding spans and dive straight into the UMAP visualizer, simplifying troubleshooting for your genAI applications. Learn more →

Experiments Details Visualization
Users can now view a detailed breakdown of labels for their experiments on the Experiments Details page.

Support for o1-mini and o1-preview in playground
We've added full support for all available OpenAI models in the playground including the o1-mini
and o1-preview
.

Improved auto-complete in playground
We've added better input variable behavior, autocompletion enhancements, support for mustache/f-string input variables, and more.
Filter history
We now store the last three filters used by a user! Users can easily access their filter history in the query filters dropdown, making it simpler to reuse filters for future queries.

Tracing quick filters
Apply filters directly from the table by hovering over the text to reveal the filter icon.

New arize-otel package
We made it way simpler to add automatic tracing to your applications! It's now just a few lines of code to use OpenTelemetry to trace your LLM application. Check out our new quickstart guide which uses our arize-otel package.
Easily add spans to datasets
Easily add spans to a dataset from the Traces page using the "Add to Dataset" button.

See more
2024202320222021Last updated
Was this helpful?