When to use this
Use agent experiments when you want to answer questions like:- Does changing a router prompt fix tool selection across the dataset?
- How does a model swap on one expert node affect the full supervisor agent’s outputs?
- Did a new system prompt break downstream tool calls?
- How do different parameter combinations compare on the same realistic inputs?
How it works

You deploy your agent behind an HTTP endpoint
Any framework — LangGraph, CrewAI, OpenAI Agents SDK, Claude Agent SDK, or custom code — works. The only requirement is a
POST endpoint that accepts JSON and returns JSON.You register the endpoint in Arize
In Space Settings → Agents, you add an Agent Configuration: the endpoint URL, auth headers, and a JSON Schema for the request body. See Setting up your agent endpoint.
You run an experiment against a dataset
From the dataset page, pick New Experiment → Run against agent, choose the agent configuration, optionally override the config payload, and click Run.
Arize POSTs each row to your endpoint
The coordinator hydrates your request template with each dataset row and POSTs in parallel (with retries, timeouts, and rate limiting). Every row produces one experiment run.
Traces link back automatically
If your agent is instrumented with Arize tracing, Arize propagates a W3C
traceparent header so every span your agent emits becomes a child of the experiment-run trace. See Setting up tracing for agent experiments.What you get
Every row of the dataset turns into one experiment run with:- The full request body sent to your agent
- The full response body returned
- Any traces your agent emitted, nested under the experiment-run trace
- Failure details (HTTP error, timeout) for runs that didn’t complete
- Evaluator scores, if you attach evaluators
What you don’t need to do
- You don’t write task code in Arize. Your agent already exists; we just call it.
- You don’t move your model or data. Arize never sees your agent’s internals — only the responses your endpoint returns.
- You don’t need to be an engineer to run one. Once an engineer registers the agent configuration, anyone in the space can kick off an experiment from the UI.
Compared to other workflows
| Playground experiment | Code experiment | Agent experiment | |
|---|---|---|---|
| Tests | A single prompt | A Python function | A deployed agent endpoint |
| Where it runs | Arize-hosted | Your Python runtime | Your hosted infra |
| Who can run it | Anyone in the space | Engineers | Anyone in the space |
| Multi-step, tool use, routing | No | Yes | Yes |
| Code change required | No | Yes | No |
Next steps
Set up your agent endpoint
Register your deployed agent with Arize.
Set up tracing
Link agent traces to experiment runs via trace context propagation.
Run an experiment
Pick a dataset, run, and compare.