Skip to main content
You’ve built a customer-service chatbot. It retrieves policy documents, sends them to an LLM, and generates answers. It works… most of the time. But when a customer gets a wrong answer — like being told they can get a refund when they can’t — you have no way to figure out why. Was the wrong document retrieved? Did the LLM ignore the context? Did the prompt not give clear enough instructions? Without visibility into what’s happening inside your app, every bug is a guessing game. Tracing solves this by capturing every step of every request — retrieval, LLM calls, inputs, outputs, latency, token counts — so you can see exactly what happened and where things went wrong. In this guide, you’ll instrument a simple RAG chatbot and send traces to Arize AX. By the end, you’ll have full visibility into every request your app handles.
This is Part 1 of the Arize AX Get Started series. Each guide builds on the previous one.

Before you start

You’ll need: We’ve prepared a companion notebook that builds the example chatbot used throughout this series. You can download it here or open it in Colab and follow along, or adapt the steps to your own application. The example app is SkyServe, an airline customer-service chatbot that answers questions about refund policies, baggage rules, rebooking procedures, and a loyalty program. It uses ChromaDB for retrieval and OpenAI for generation — a straightforward RAG setup.

Step 1: Sign up and get your credentials

If you haven’t already, create a free Arize AX account and verify your email. Once you’re logged in, navigate to Settings in the left sidebar, then go to the API Keys page. You’ll need two things:
  • Space ID — shown in the “Current Space ID” section
  • API Key — click + New API Key, give it a name, and save the key somewhere safe
Settings page showing Space ID and API Keys

Step 2: Instrument your app

Install the tracing packages. We use arize-otel, a lightweight wrapper around OpenTelemetry, along with the OpenAI auto-instrumentor:
pip install arize-otel openai openinference-instrumentation-openai
Now add these lines to your app, before any OpenAI calls:
from arize.otel import register, Endpoint
from openinference.instrumentation.openai import OpenAIInstrumentor

tracer_provider = register(
    space_id="YOUR_SPACE_ID",
    api_key="YOUR_API_KEY",
    project_name="skyserve-chatbot",
    endpoint=Endpoint.ARIZE,
)

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
That’s it. Every OpenAI call your app makes will now be captured and sent to Arize AX as a trace. The project_name is how you’ll find your traces in the UI — pick something descriptive.
Arize AX supports auto-instrumentation for 30+ LLM providers and frameworks, including LangChain, LlamaIndex, Anthropic, and more. The pattern is always the same: register a tracer provider, then instrument.

Step 3: Generate some traces

Run the companion notebook (or your own app) to send some requests through the chatbot. The notebook includes 15 sample customer questions — a mix of straightforward ones and tricky edge cases. Here are a few of the questions it sends:
questions = [
    "Can I get a refund on my Basic fare ticket I bought 3 days ago?",
    "How much does a carry-on bag cost?",
    "I'm a Gold SkyMiles member. Do I get free checked bags?",
    "My flight was delayed 5 hours. What am I entitled to?",
    "I bought a non-refundable ticket yesterday. Can I still get my money back?",
]
Some of these have nuanced answers (the non-refundable ticket bought yesterday is refundable because of the 24-hour policy). These are exactly the kinds of edge cases where chatbots get tripped up — and where tracing is most valuable.

Step 4: Explore your traces in Arize AX

Open Arize AX and navigate to your skyserve-chatbot project. You’ll see a list of traces — one for each question the chatbot answered.
Traces list view showing chatbot requests
Click on any trace to expand it. You’ll see the full span tree:
  • The LLM span: What model was called, what messages were sent, what the response was, how long it took, and how many tokens were used
  • Input and output: The exact prompt that was constructed (including the retrieved context) and the response that was generated
Expanded trace showing span tree, input messages, output, and latency

Finding a problem

Look through your traces for a response that doesn’t look right. For example, find the trace for “Can I get a refund on my Basic fare ticket I bought 3 days ago?” The correct answer depends on whether the ticket is refundable or non-refundable — but the chatbot might give a generic answer. Click into the trace and look at the retrieved context: did the retrieval step pull the right policy document? Does the LLM response match what the document says?
Trace showing retrieved context alongside an imperfect response
Without tracing, you’d just know “the answer was wrong.” With tracing, you can see exactly where the breakdown happened — wrong document retrieved, correct document but LLM misinterpreted it, or the prompt didn’t give clear enough instructions.

Congratulations!

You now have full visibility into every step of your chatbot’s reasoning — what documents it retrieved, what prompt was constructed, what the LLM returned, and how long each step took. You can spot problems instantly instead of guessing. But you’ve been manually clicking through traces to find problems. That works for 15 test questions, but your chatbot will handle hundreds or thousands of requests per day. You can’t review them all by hand. Next up: We’ll set up automated evaluations so Arize AX flags quality problems for you — no manual review required.

Next: Measure Quality Automatically

Learn more about Tracing