LLM tracing is the practice of recording each step an LLM application takes to handle a request, captured as a structured trace made of spans. Each span represents one operation – a model call, a retrieval, a tool call, a chain step – along with its inputs, outputs, and metadata. Strung together, the spans form the complete execution record of that one request.
The reason to trace is simple: an LLM application is rarely a single model call. A real request retrieves context, formats a prompt, calls a model, maybe calls a tool, and assembles a final answer. When the output is wrong, you need to know which of those steps caused it. LLM tracing is what turns a black-box request into a step-by-step record you can read.
Key takeaways
LLM tracing records each step of a request as spans and groups them into a trace.
A span captures one operation with its inputs, outputs, and metadata; a trace is all the spans for one request; a session groups traces for a longer interaction.
Spans come in kinds – LLM, retriever, tool, chain, agent – so the trace shows what type of work each step did.
Tracing is the data-capture layer that LLM evaluation and observability are built on.
Open standards like OpenInference keep traces portable across tools.
What a span captures
The span is the unit of LLM tracing. A useful span records:
Inputs and outputs. For an LLM span, the prompt sent and the completion returned. For a tool call span, the arguments passed and the result. For a retriever span, the query and the documents returned.
Metadata. The model and parameters, token usage, latency, and any error.
Structure. Which span this one descends from, so the trace forms a tree that shows what called what.
Span kind matters because it tells you how to read the step. Arize’s tracing docs describe span kinds for LLM applications – LLM, retriever, tool, chain, agent – so a trace is not just a flat list of events but a typed map of the request: here is where it retrieved, here is where it called the model, here is where it invoked a tool.
Spans, traces, and sessions
These three levels are worth keeping straight.
A span is one operation.
A trace is the full set of spans for one request – everything that happened from input to final output.
A session groups multiple traces for a longer interaction, like a multi-turn conversation, so you can follow a user across several requests.
This hierarchy is what lets you zoom in and out. You can look at a whole session to understand a conversation, open one trace to see a single request, or drill into one span to read the exact prompt that produced a bad answer.
Why LLM tracing matters
Tracing is the foundation everything else stands on.
For debugging agents and LLM apps, the trace is where you find the actual cause of a failure. A wrong answer usually is not a model problem; it is a retrieval that missed the right document, a prompt that lost a key instruction, or a tool that returned bad data. Only the trace shows you which. Without it, you are guessing from the final output.
Tracing is also the input to evaluation. You cannot score behavior you did not capture. Once traces exist, you can attach eval results to spans and sessions, which is what makes LLM observability possible: the trace records what happened, and the evals tell you whether it was good.
LLM tracing vs distributed tracing vs observability
These terms sit close together, so it helps to separate them.
Distributed tracing is the general technique, from microservices, for following one request across many components using spans and a shared trace ID.
LLM tracing applies that technique to LLM applications, where the spans capture prompts, retrievals, and tool calls rather than only service calls.
LLM observability is the broader practice that uses trace data, plus evaluation and monitoring, to understand and improve an application.
Put simply: LLM tracing is the data. Observability is what you do with it.
FAQ
What is the difference between a span and a trace?
A span is one operation – a single model call, retrieval, or tool call. A trace is the full collection of spans for one request, linked into a tree that shows the order and structure of what happened.
What is a session in LLM tracing?
A session groups multiple traces for a longer interaction, such as a multi-turn conversation, so you can follow one user across several requests instead of looking at each request in isolation.
Is LLM tracing the same as LLM observability?
No. Tracing captures the step-by-step data. Observability is the broader practice of using that data, along with evaluation and monitoring, to understand and improve the application.
How is LLM tracing different from distributed tracing?
They share the same mechanics – spans, trace IDs, parent-child links. LLM tracing applies them to AI applications, so the spans capture prompts, retrieved context, and tool arguments instead of only service-level calls.
Do I need an open standard to trace LLM apps?
It is not required, but standards like OpenInference and OpenTelemetry keep your traces portable and consistent across tools, which prevents your instrumentation from being locked to one vendor.