> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Agent experiments overview

> Test deployed agents end-to-end by running a dataset against a customer-hosted agent endpoint and collecting the results as an experiment.

Agent experiments let you test a deployed agent end-to-end — routing, tool selection, multi-step orchestration — by hitting your own HTTP endpoint with every row in a dataset and collecting the results as a standard experiment in Arize.

Unlike [Experiment in playground](/ax/improve/experiment-in-playground), which tests a single prompt in isolation, agent experiments exercise the **entire agent flow**. Your agent runs in your infrastructure; Arize orchestrates the dataset run, captures responses, links traces, and stores everything as comparable experiment runs.

## When to use this

Use agent experiments when you want to answer questions like:

* Does changing a router prompt fix tool selection across the dataset?
* How does a model swap on one expert node affect the full supervisor agent's outputs?
* Did a new system prompt break downstream tool calls?
* How do different parameter combinations compare on the same realistic inputs?

If the change you want to test fits inside a single prompt, [Experiment in playground](/ax/improve/experiment-in-playground) is faster. If the change spans multiple LLM calls, retrieval, tool execution, or routing, agent experiments are the right tool.

## How it works

<Frame caption="Agent experiment flow: dataset → Arize coordinator → your agent → experiment runs + traces">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/set_up_hero.avif" alt="Agent experiment flow diagram" />
</Frame>

<Steps>
  <Step title="You deploy your agent behind an HTTP endpoint">
    Any framework — LangGraph, CrewAI, OpenAI Agents SDK, Claude Agent SDK, or custom code — works. The only requirement is a `POST` endpoint that accepts JSON and returns JSON.
  </Step>

  <Step title="You register the endpoint in Arize">
    In **Space Settings → Agents**, you add an *Agent Configuration*: the endpoint URL, auth headers, and a JSON Schema for the request body. See [Setting up your agent endpoint](/ax/improve/setup-agent-endpoint).
  </Step>

  <Step title="You run an experiment against a dataset">
    From the dataset page, pick **New Experiment → Run against agent**, choose the agent configuration, optionally override the config payload, and click **Run**.
  </Step>

  <Step title="Arize POSTs each row to your endpoint">
    The coordinator hydrates your request template with each dataset row and POSTs in parallel (with retries, timeouts, and rate limiting). Every row produces one experiment run.
  </Step>

  <Step title="Traces link back automatically">
    If your agent is instrumented with Arize tracing, Arize propagates a W3C `traceparent` header so every span your agent emits becomes a child of the experiment-run trace. See [Setting up tracing for agent experiments](/ax/improve/agent-tracing-context).
  </Step>
</Steps>

## What you get

Every row of the dataset turns into one experiment run with:

* The full request body sent to your agent
* The full response body returned
* Any traces your agent emitted, nested under the experiment-run trace
* Failure details (HTTP error, timeout) for runs that didn't complete
* Evaluator scores, if you attach evaluators

You can then [compare experiments](/ax/improve/experiment-in-playground#compare-experiments) the same way you compare prompt-level runs.

## What you don't need to do

* **You don't write task code in Arize.** Your agent already exists; we just call it.
* **You don't move your model or data.** Arize never sees your agent's internals — only the responses your endpoint returns.
* **You don't need to be an engineer to run one.** Once an engineer registers the agent configuration, anyone in the space can kick off an experiment from the UI.

## Compared to other workflows

|                               | Playground experiment | Code experiment     | **Agent experiment**      |
| ----------------------------- | --------------------- | ------------------- | ------------------------- |
| Tests                         | A single prompt       | A Python function   | A deployed agent endpoint |
| Where it runs                 | Arize-hosted          | Your Python runtime | Your hosted infra         |
| Who can run it                | Anyone in the space   | Engineers           | Anyone in the space       |
| Multi-step, tool use, routing | No                    | Yes                 | Yes                       |
| Code change required          | No                    | Yes                 | No                        |

Agent experiments combine the no-code launch of Playground experiments with the multi-step realism of code experiments.

## Next steps

<CardGroup cols={3}>
  <Card title="Set up your agent endpoint" href="/ax/improve/setup-agent-endpoint">
    Register your deployed agent with Arize.
  </Card>

  <Card title="Set up tracing" href="/ax/improve/agent-tracing-context">
    Link agent traces to experiment runs via trace context propagation.
  </Card>

  <Card title="Run an experiment" href="/ax/improve/run-agent-experiments">
    Pick a dataset, run, and compare.
  </Card>
</CardGroup>
