Documentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.

Google Colab
- How OpenInference layers on top of OpenTelemetry to add AI-aware semantics
- The hierarchy of sessions, traces, and spans that organizes your telemetry
- Three ways to capture spans — auto-instrumentation, manual instrumentation, and the hybrid approach that combines them
- The common attributes every span carries, and the kind-specific attributes for the four core span kinds (LLM, chain, agent, and tool)
- How to add or override attributes on spans, including auto-instrumented spans you cannot access directly
Every runnable code block below is a complete, self-contained Python script. Save each to a
.py file and run it in your venv. The same setup code (register(...) and OpenAIInstrumentor().instrument(...)) appears at the top of every block — that’s intentional, so you can run any block in isolation without having to assemble pieces from earlier sections.Initial setup
You will need an Arize AX account to run this guide. Sign up now for free if you don’t have an account.Create a project directory and virtual environment
Create a new directory for your script and a Python virtual environment inside it.Install libraries
Install all the dependencies you will use across the rest of the guide:Set environment variables
The script reads three secrets from environment variables. Find your Arize Space ID and API Key on your Space Settings page:
Export them in the same shell session you will run the code from:
OPENAI_API_KEY automatically; the Arize values are read by the script.
Setup tracing
Every runnable code block in this guide includes the same tracing setup at the top. It uses thearize-otel convenience function to register a tracer provider that sends spans to Arize AX, then enables the OpenAI auto-instrumentor so calls to the OpenAI SDK are traced automatically.
See The arize.otel helpers for the full set of arize.otel functions, including routing traces to multiple projects from a single app.
Save this code as a .py file and run it to verify your setup before proceeding — you should see the OpenTelemetry tracing details printed to your terminal:
Introduction to OpenInference
OpenInference is an open-source set of conventions and instrumentation libraries for tracing AI applications. It is maintained by Arize and is the standard that Arize AX uses to render LLM, tool, agent, chain, retriever, and other AI-specific spans in the trace view. OpenInference is an extension to OpenTelemetry, not a replacement for it. It uses the standard OpenTelemetry SDK and libraries under the hood — the sameTracerProvider, Tracer, Span, SpanProcessor, and Exporter you would use for any OTel-instrumented service. What OpenInference adds is:
- A set of semantic conventions that describe how to represent AI concepts (LLM calls, prompts, messages, tool invocations, retrieval, agent steps, sessions) as span attributes
- A library of auto-instrumentors for popular LLM SDKs and orchestration frameworks (OpenAI, Anthropic, Bedrock, LangChain, LlamaIndex, CrewAI, AutoGen, and many more)
- OpenTelemetry and OpenInference concepts — the full reference companion to this guide.
- OpenInference semantic conventions spec — the formal definitions for span kinds, attributes, and message structure.
- OpenInference repository — auto-instrumentors, examples, and source for every supported language and framework.
- OpenTelemetry documentation — the underlying observability framework that OpenInference builds on.
.py file and run it:
otel-best-practices project to view your traces.
Sessions, traces, and spans
Three concepts shape how OpenInference (and OpenTelemetry) organize tracing data, and they nest inside each other:- Span — a single step in your application, such as an LLM call, a tool invocation, or a chain stage. Each span has a name, start and end timestamps, attributes, and a potentially a parent — spans nest to form a tree. The root of the tree is known as the root span, and has no parent.
- Trace — a collection of spans tied together by a shared
trace_id. A trace represents one end-to-end request through your agent. - Session — a collection of traces tied together by a shared
session.id. A session is a logical grouping of traces based on a shared concept, such as multiple agent interactions to solve the same task, or to help with a continuous conversation.
Sessions
In Arize AX, start with the Sessions tab. You should see a single session that represents the entire two step conversation.
If you select the session, a pane will appear with details of the session, including both steps in the conversation, showing the input to and the output from the agent. It will also show the latency, so the time the agent took to run, as well as the total number of tokens used and the estimated cost based off published token pricing from the LLM provider if available.
Traces
Select the Traces tab. You should see both of the steps in the conversation as distinct traces, showing the input and output to the agent. The code you ran just makes a single LLM call, so the trace has the input set to what was sent to the LLM, and the output set to the response from the LLM. In a more complicated trace, the input is what was sent to the agent, and the output is the final response sent by the agent after it has completed its entire processing, including calling LLMs or tools.
If you select a trace, a pane will appear with details of the trace. It will show a tree of spans, along with latency, token counts, and estimated cost.
You can also navigate to the individual traces directly from the session view.
Spans
In the trace view you will see a tree of the spans that make up the trace. Spans are grouped into traces by having the sametrace_id set on them. A trace is a tree of spans, so one span will be the root span at the top of the tree, and the rest of the spans will be under that tree. The hierarchy is defined using the parent_id on the span — each child span has its parent_id set to the id of the parent span.
In the trace we have 2 spans:
The root span is a Chain span. Chain spans are starting points for a set of related spans, you can think of them as a folder that groups spans together. In this example, the chain span isn’t really necessary, it’s just here to help show a tree.
Under the root span is an LLM span called ChatCompletion. This span represents a call to an LLM, in our case OpenAI.
Against each span is an Attributes tab that has JSON containing all the attributes associated with the span, such as the input and output, number of tokens used for an LLM span, and so on. We will cover these attributes in the rest of this guide.
OpenInference defines a fixed set of span kinds:
| Span kind | Description |
|---|---|
LLM | A call to a large language model. Captures the model, input messages, output messages, token counts, and cost. |
CHAIN | A starting point or link between application steps. Commonly used as a parent span to group related work into a logical block. |
AGENT | A span representing an agent’s work — typically wraps LLM and tool spans together |
TOOL | A call to an external tool or function, often invoked in response to a tool-use request from an LLM |
RETRIEVER | A retrieval operation, such as fetching documents from a vector store or search index |
EMBEDDING | A call to an embedding model |
RERANKER | A reranking step that reorders a set of retrieved documents |
GUARDRAIL | A safety or policy check, such as content moderation, PII detection, or input validation |
EVALUATOR | An evaluation step that scores or judges an LLM output |
PROMPT | A prompt definition or templating step |
UNKNOWN | Used when no other kind applies |
Configuring sessions
Sessions are a logical grouping of traces based on a continuous set of interactions with an agent. For example, in a chatbot, the entire multi-turn conversation that a single user has with the agent would be a session. When the same user starts a brand new conversation with no previous context, or a new user starts a conversation, this would be a new session. Sessions are explicitly managed by the engineer building the agent; they are not created automatically when sending traces. Sessions are set with theusing_session function. This sets the session id for any spans created in any code run in this block. using_session is one of a small family of OpenInference context managers — see OpenInference context managers for the full list (using_user, using_metadata, using_tags, using_prompt_template, using_attributes).
The following code contains a call to OpenAI inside a session. The session id is hardcoded here, so if you run this code multiple times, each run will be a new trace inside the same session.
Save and run this code:
Now run this code, which uses a different session id and so will create a new session:
Capturing spans and traces
In the examples so far you have already seen both ways that OpenInference creates spans:- Auto-instrumentors wrap a specific library or framework (such as the OpenAI SDK, or LangChain) and emit a span for every call to that library automatically, along with spans for the different actions that the framework performs, such as tool calling. Each call to the library or framework is a separate trace.
- Manual instrumentation lets you create your own traces and spans by calling the tracer directly in your application code
Auto-instrumentors
Auto-instrumentors are libraries that instrument an SDK or framework, and automatically emit spans for every call, and every action taken by the SDK or framework. You already set up an auto-instrumentor in the Initial setup section:client.responses.create() and client.chat.completions.create() in your code creates an LLM span automatically in a new trace. If you are using a more advanced framework that handles tool calling for example, then each call to the framework would be a new trace, with spans for the LLM and tool calls, grouped under a chain span.
OpenInference provides auto-instrumentors for most popular AI SDKs and orchestration frameworks: OpenAI, Anthropic, Bedrock, LangChain, LlamaIndex, CrewAI, AutoGen, and many more. See the OpenInference repository for the full list.
If you run the code below, the auto-instrumentor will create a trace with a single LLM span. Save it as a .py file and run it:
Capturing Spans Example session. You will see a single LLM span — the auto-instrumentor created it automatically for the client.responses.create() call, without you writing any tracing code.
Manual instrumentation
Manual instrumentation gives you full control. You call the OpenTelemetry tracer directly to start a span, set its attributes (including its OpenInference span kind), and end it when the work is done. The recommended pattern is the context-manager form, which sets the span as the active span on the OpenTelemetry context (so any spans created inside the block automatically become its children) and ends the span when the block exits:openinference.span.kind attribute on the span. The kind controls how Arize AX renders the span (the icon and the detail view) and which set of OpenInference attributes the span is expected to carry.
The following code creates a trace with three manually-created spans: a parent data-pipeline chain span with two child spans (step-1-validate and step-2-format) nested inside. There are no LLM or tool calls — every span is created by your code. Save it as a .py file and run it:
Capturing Spans Example session. You will see a single chain span called data-pipeline with two child chain spans (step-1-validate and step-2-format) nested underneath.
Hybrid instrumentation
The most powerful pattern is to use both approaches together. Hybrid instrumentation lets you wrap auto-instrumented calls in your own manually-created spans, so you can group SDK calls into logical units, add custom attributes, and build the trace tree that best represents your application — without losing any of the rich attributes that the auto-instrumentor captures. Auto-instrumented spans and manually-created spans nest together naturally because they share the same OpenTelemetry context. When you open a manual span withtracer.start_as_current_span(...), it becomes the active span on the context. Any call to an auto-instrumented SDK inside that block will create its span as a child of your manual span.
You have already seen hybrid instrumentation in the Introduction to OpenInference section. The ask_llm function wraps each OpenAI call in a manually-created chain span.
The manually-created chain span is the parent; the LLM span that the OpenAI auto-instrumentor produces around client.responses.create() automatically becomes its child. That is what gives you the tree structure you saw in Arize AX when you ran the introduction example — a chain span at the top, with an LLM span nested inside.
This pattern is the typical shape of a real-world traced application: manual chain or agent spans give you the high-level structure of your business logic; auto-instrumented spans fill in the low-level detail of every SDK call you make inside them.
Save and run this hybrid instrumentation example:
Capturing Spans Example session. You will see a chain span called manual-chain with an LLM span nested inside it — the manual span is the parent, and the auto-instrumented LLM span automatically became its child because they share the same OpenTelemetry context.
Span attributes
Every OpenInference span carries a small set of common attributes that apply regardless of the span kind, plus a kind-specific set added on top. The common attributes available on any span kind are:| Attribute | Description |
|---|---|
openinference.span.kind | The span kind: LLM, CHAIN, AGENT, TOOL, RETRIEVER, EMBEDDING, RERANKER, GUARDRAIL, EVALUATOR, PROMPT, or UNKNOWN. Controls how Arize AX renders the span. |
input.value | The input to the span as a string. If the value is structured, serialize it to JSON and set input.mime_type accordingly. |
input.mime_type | The mime type of input.value. Defaults to text/plain; set to application/json if the value is a JSON string. |
output.value | The output from the span as a string |
output.mime_type | The mime type of output.value. Same convention as input.mime_type. |
metadata | A JSON dictionary of your own fields. Use it to attach domain-specific context such as user tier, feature flag, or request id. |
session.id | Groups multiple traces into a session. Set with using_session(...) or directly via span.set_attribute(SpanAttributes.SESSION_ID, ...). |
user.id | Identifies the user the trace belongs to. Set with using_user(...) or directly. |
tag.tags | A list of string tags for filtering. Set with using_tags(...). |
openinference.span.kind drives the span icon and the kind-specific detail view; input.value and output.value on the root span feed the trace-level input and output preview in the Traces and Sessions tabs; session.id groups traces into sessions; metadata and tag.tags are filterable across spans.
The full attribute catalogue is described in OpenInference semantic conventions (the overall standard) and OpenInference span kinds (per-kind reference).
The core span types
OpenInference defines 11 span kinds, but most AI applications use just four: LLM, chain, agent, and tool. These show up in almost every real-world trace — an agent makes LLM calls, LLM calls trigger tool calls, and chain spans group the related work together. For the canonical reference of every span kind and the attributes each carries, see OpenInference span kinds. The following example uses the OpenAI Agents SDK to produce a single trace containing all four kinds. The SDK has its own auto-instrumentor —openinference-instrumentation-openai-agents — which emits the full set of span kinds for every agent run.
The Agents SDK has its own instrumentor, OpenAIAgentsInstrumentor. Attach it to the existing tracer provider alongside the OpenAI auto-instrumentor — the Agents SDK creates its own LLM spans, so the two cooperate without duplicating work.
The code below sets up both instrumentors, defines a simple travel assistant with two tools, then asks it a question that requires both tools to be called. The run is wrapped in a session so the trace is easy to find in Arize AX. Save and run it:
In the companion notebook,
await Runner.run(...) is used because Jupyter supports top-level await. In a regular Python script, use the synchronous Runner.run_sync(...) instead, as shown above.otel-best-practices project. A single agent run produces a trace containing all four core span kinds:
- Two agent spans — an outer
Agent workflowwrapper and an innerTravelAssistant - Three chain spans — one wrapping the whole workflow plus one per agent turn
- Two tool spans, one for each call to
get_weatherandget_time_zone - Four LLM spans — each agent turn produces a Responses API call from the SDK with the underlying OpenAI client call nested inside it
This is the trace shape you will see from most non-trivial agents: an outer agent/chain workflow, tool spans for each external call, and LLM spans for every model call inside.
LLM spans
LLM spans represent a call to a large language model. They capture everything you need to debug or analyze the call: the model that was used, the input messages, the output, tools, token counts, costs, and more. In the agent trace are several LLM spans. Select one of these, and you will see the input and output from that LLM call. Against the span in the tree you will also see the number of tokens used, the latency, and the cost. In the Attributes tab, you can see the full attributes for the span.
The relevant attributes for this example are:
LLM-specific attributes
Beyond the common attributes that any span carries (covered in the Span attributes section above), the OpenAI auto-instrumentor adds an LLM-specific set on every LLM span — all under thellm.* namespace and following the OpenInference semantic conventions:
| Attribute | Description |
|---|---|
llm.model_name | The exact model identifier returned by OpenAI, for example gpt-5.4-mini-2026-03-17. This is the resolved snapshot version, not the alias you passed in. |
llm.provider | The LLM provider, here openai |
llm.system | The AI system identifier, also openai |
llm.invocation_parameters | A JSON string of the parameters passed to the API: {"model": "gpt-5.4-mini"} |
llm.input_messages | The messages sent to the API as a structured array. Each entry has message.role and message.content. |
llm.output_messages | The messages returned by the API as a structured array. Each entry has message.role and a message.contents list of structured content items (with message_content.text and message_content.type). This shape is multimodal-aware — text, image, audio, and reasoning content all fit the same structure. |
llm.token_count.prompt / .completion / .total | Token counts for the call, with detail breakdowns under llm.token_count.prompt_details.* (cache_read, input) and llm.token_count.completion_details.* (output, reasoning) |
llm.cost.prompt / .completion / .total | Estimated cost in USD with the same _details breakdowns. Arize AX computes these from the token counts and the published pricing for the model. |
Tool spans
Tool spans represent a call to an external tool or function — typically a function the LLM has decided to call. They capture the tool’s name, the arguments the LLM passed in, and the value the tool returned. In the agent trace above you have two tool spans:get_weather and get_time_zone, one for each tool the agent invoked. Select one of them in the trace tree to see the tool-specific attributes — the tool name, the arguments the LLM passed in ({"city": "Tokyo"}), and the value the tool returned.
The relevant attributes for this example are:
Tool-specific attributes
Beyond the common attributes that any span carries, tool spans carry a small set of tool-specific attributes under thetool.* namespace:
| Attribute | Description |
|---|---|
tool.id | The identifier for the result of the tool call. Corresponds to the tool_call.id emitted by the LLM, which lets Arize AX link the tool span back to the LLM call that requested it. |
tool.name | The name of the tool |
tool.description | The tool’s description. The LLM uses this when deciding which tool to call. |
tool.parameters | A JSON string of the parameter values the LLM passed to the tool |
tool.json_schema | The full JSON schema of the tool’s input, typically in OpenAI tool-calling format. Tells the LLM what shape of arguments the tool expects. |
input.value and output.value, and the tool span links back to the parent LLM span via tool.id.
Agent spans
Agent spans represent the work of an autonomous agent — the orchestration that decides when to call the LLM, when to call tools, when to call the LLM again, and when to stop. An agent span is typically the parent of the LLM, tool, and chain spans that make up the agent’s loop. In the agent trace above, theTravelAssistant span is an agent span. Select it to see how it wraps both of the agent’s turns, with all of the LLM and tool calls nested inside it.
The relevant attributes for this example are:
graph.node.id to carry the agent’s name (TravelAssistant) rather than the convention’s agent.name. This is so Arize AX can render multi-agent systems with handoffs as a graph view. When you create agent spans manually, set agent.name (and optionally the graph.node.* attributes if you want the graph view).
Agent-specific attributes
Agent spans carry a small set of agent-specific attributes:| Attribute | Description |
|---|---|
agent.name | The name of the agent. Agents that perform the same logical role should share a name so you can group their traces together. |
graph.node.id | The id of this agent’s node in an execution graph. Optional — set when you want to visualize a multi-agent system as a graph in Arize AX. |
graph.node.name | A human-readable name for the graph node |
graph.node.parent_id | The id of the parent node. Leave unset for a root agent. |
graph.node.* attributes are how Arize AX renders multi-agent systems (LangGraph, AutoGen, CrewAI, etc.) as a graph view alongside the trace tree.
Chain spans
Chain spans are general-purpose grouping spans. Use a chain span when you want to group a set of related work under a single parent in the trace tree — a multi-step pipeline, one agent turn, an LLM call with pre- and post-processing, or just a logical block in your application code. In the agent trace above the outerAgent workflow is a chain span, and each agent turn is also wrapped in a chain span called turn. Select a turn span to see how it groups the LLM and tool calls for that turn together.
The relevant attributes for this example are:
Chain-specific attributes
Chain spans have no kind-specific attributes. They rely on the common attributes covered in the Span attributes section above —openinference.span.kind, input.value, output.value, metadata, session.id, and so on. The chain span’s value is purely structural: it gives you a named parent in the trace tree, with whatever input and output you choose to attach to it.
Overriding or adding attributes
Auto-instrumentors capture the standard OpenInference attributes for every span they create, but you often want to add your own. Common reasons:- Tag the call for filtering — for example
experiment="v2-prompt"ortenant="acme" - Attach domain metadata — user tier, request id, feature flag value
- Record a prompt template — the template string, version, and variables you used, separate from the final flattened prompt
span.set_attribute(...) directly inside the with block, as you saw in the manual instrumentation example. With auto-instrumented spans you do not have direct access to the span object — but you can still enrich it by putting attributes into the OpenTelemetry context using the OpenInference context managers. The auto-instrumentor reads from that context when it creates the span. This means the attributes are applied to every span created inside the block, no matter who creates it.
The available context managers are:
| Context manager | Sets |
|---|---|
using_session(session_id) | session.id |
using_user(user_id) | user.id |
using_metadata(metadata) | metadata (a JSON dictionary of your own fields) |
using_tags(tags) | tag.tags (a list of strings) |
using_prompt_template(template=, variables=, version=) | The llm.prompt_template.* attributes. Most useful on LLM spans. |
using_attributes(...) | A convenience wrapper that combines all of the above into a single call |
using_metadata to attach a domain-specific metadata dictionary. The example here wraps an OpenAI call so the metadata ends up on an LLM span, but the same pattern works for any span — auto-instrumented or manual — created inside the block. Save and run it:
metadata— a JSON string containing{"user_tier": "premium", "request_source": "cookbook"}
llm.* attributes.
You can also filter spans by metadata values in the Arize AX trace view, which makes it easy to slice traces by tenant, feature flag, or any other domain dimension. In the Spans tab, set the filter to attributes.metadata.request_source = "cookbook" to only see spans created with the request_source metadata set to cookbook.
Overriding a tool attribute
You can also override attributes that an auto-instrumentor has set, or add attributes that it left out. The OpenAI Agents SDK auto-instrumentor only setstool.name on tool spans — it does not populate tool.description. The following code redefines get_weather to set a custom tool.description attribute on the active tool span, then recreates the agent and re-runs it.
The pattern works because the auto-instrumentor opens the tool span before calling your function, so the tool span is the active span when your function body runs. Calling set_attribute(...) on it inside the function body either overrides the attribute (if the instrumentor set it) or adds it (if the instrumentor did not).
Save and run this code:
otel-best-practices project (find it via the Tool Override Example session). Select the get_weather tool span and open its Attributes tab. You will now see tool.description populated with the string you set inside the function body, alongside the standard tool.name the auto-instrumentor produced.
The relevant attributes for this example are:
get_time_zone tool span in the same trace still has only tool.name, which is a good visual confirmation that the override applies only to the span you set attributes on.
Summary
You have now seen the building blocks for tracing AI applications with OpenInference and Arize AX:- OpenInference layers on top of OpenTelemetry to add semantic conventions and auto-instrumentors for AI-specific concepts. The standard OpenTelemetry SDK still drives everything underneath —
TracerProvider,Tracer,Span,SpanProcessor, andExporterare all unchanged. - Spans, traces, and sessions form a hierarchy. A span is one step. A trace is a tree of spans tied together by
trace_id. A session is a group of traces tied together bysession.id. Sessions are how you stitch a multi-turn conversation together in Arize AX. - There are three ways to capture spans. Auto-instrumentors wrap SDKs and emit spans for every call automatically. Manual instrumentation lets you create spans yourself with
tracer.start_as_current_span(...). Hybrid instrumentation combines the two — your manual spans wrap auto-instrumented calls and become their parents in the trace tree. - Every span carries a small set of common attributes —
openinference.span.kind,input.value,output.value,metadata,session.id,user.id, andtag.tags— regardless of the kind. Theopeninference.span.kindattribute drives how Arize AX renders the span. - Four span kinds cover most AI applications: LLM, chain, agent, and tool. Each adds a kind-specific set of attributes —
llm.*for model name, token counts, and costs;tool.*for tool name and description;agent.name(orgraph.node.idfor multi-agent graphs); chain spans rely on the common attributes alone. - You can enrich auto-instrumented spans. Use OpenInference context managers like
using_session,using_metadata,using_tags, andusing_prompt_templateto attach attributes that the auto-instrumentor picks up via the OpenTelemetry context. You can also override or add specific attributes by grabbing the active span inside a tool function and callingset_attribute(...)directly.
Where to go next
- Read the OpenTelemetry and OpenInference concepts section of the Arize docs for the full reference companion to this guide
- Dive into the OpenInference span kinds reference for every span kind and the attributes each carries
- Read Instrumentation approaches for a deeper comparison of auto, manual, and hybrid instrumentation
- Learn how to propagate context across services or async boundaries when your app spans multiple processes
- Reduce trace volume in production with sampling
- Browse the OpenInference repository for auto-instrumentors covering Anthropic, Bedrock, LangChain, LlamaIndex, CrewAI, AutoGen, and many others
- Read the OpenInference semantic conventions spec for the source-of-truth attribute definitions