> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluating A RAG-Powered Chatbot

> This guide demonstrates how to use Arize AX for monitoring and debugging your LLM using Traces and Spans. We're going to use data from a chatbot built on top of Arize AX docs ([/docs/ax](/ax)), with example query and retrieved text. Let's figure out how to understand how well our RAG system is working.

<Card title="Google Colab" href="https://colab.research.google.com/github/Arize-ai/tutorials/blob/main/python/llm/evaluation/llamaindex-evals.ipynb" icon="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/cookbooks/gc.png" horizontal />

In this tutorial we will:

1. Build a RAG application using Llama-Index
2. Set up [Phoenix](https://arize.com/docs/phoenix) as a [trace collector](https://arize.com/docs/phoenix/tracing/llm-traces) for the Llama-Index application
3. Use Phoenix's [evals library](https://arize.com/docs/phoenix/evaluation/llm-evals) to compute LLM generated evaluations of our RAG app responses
4. Use arize SDK to export the traces and evaluations to Arize AX

You can read more about LLM tracing in Arize AX [here](https://arize.com/docs/ax/llm-large-language-models/llm-traces).

## Install Dependencies

Let's get the notebook setup with dependencies.

```python theme={null}
# Dependencies needed to build the Llama Index RAG application
!pip install -qq gcsfs llama-index-llms-openai llama-index-embeddings-openai llama-index-core

# Dependencies needed to export spans and send them to our collector: Phoenix
!pip install -qq llama-index-callbacks-arize-phoenix

# Install Phoenix to generate evaluations
!pip install -qq "arize-phoenix[evals]>7.0.0"

# Install Arize SDK with `Tracing` extra dependencies to export Phoenix data to Arize AX
!pip install -qq "arize>7.29.0"
```

## Set up Phoenix as a Trace Collector in our LLM app

To get started, launch the phoenix app. Make sure to open the app in your browser using the link below.

```python theme={null}
import phoenix as px

session = px.launch_app()
```

Once you have started a Phoenix server, you can start your LlamaIndex application and configure it to send traces to Phoenix. To do this, you will have to add configure Phoenix as the global handler

```python theme={null}
from llama_index.core import set_global_handler

set_global_handler("arize_phoenix")
```

That's it! The Llama-Index application we build next will send traces to Phoenix.

## Build Your Llama Index RAG Application

We start by setting your OpenAI API key if it is not already set as an environment variable.

```python theme={null}
import os
from getpass import getpass

OPENAI_API_KEY = globals().get("OPENAI_API_KEY") or getpass(
    "🔑 Enter your OpenAI API key: "
)
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
```

This example uses a `RetrieverQueryEngine` over a pre-built index of the Arize AX documentation, but you can use whatever LlamaIndex application you like. Download the pre-built index of the Arize AX docs from cloud storage and instantiate your storage context.

```python theme={null}
from gcsfs import GCSFileSystem
from llama_index.core import StorageContext

file_system = GCSFileSystem(project="public-assets-275721")
index_path = "arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"
storage_context = StorageContext.from_defaults(
    fs=file_system,
    persist_dir=index_path,
)
```

We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability.

```python theme={null}
from llama_index.llms.openai import OpenAI
from llama_index.core import (
    Settings,
    load_index_from_storage,
)
from llama_index.embeddings.openai import OpenAIEmbedding


Settings.llm = OpenAI(model="gpt-4o")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
index = load_index_from_storage(
    storage_context,
)
query_engine = index.as_query_engine()
```

Let's test our app by asking a question about the Arize AX documentation:

```python theme={null}
response = query_engine.query(
    "What is Arize AX and how can it help me as an AI Engineer?"
)
print(response)
```

Great! Our application works!

## Use the instrumented Query Engine

We will download a dataset of questions for our RAG application to answer.

```python theme={null}
from urllib.request import urlopen
import json

queries_url = "http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])

queries[:5]
```

We use the instrumented query engine and get responses from our RAG app.

```python theme={null}
from tqdm.notebook import tqdm

N = 10  # Sample size
qa_pairs = []
for query in tqdm(queries[:N]):
    resp = query_engine.query(query)
    qa_pairs.append((query, resp))
```

To see the questions and answers in phoenix, use the link described when we started the phoenix server

## Run Evaluations on the data in Phoenix

We will use the phoenix client to extract data in the correct format for specific evaluations and the custom evaluators, also from phoenix, to run evaluations on our RAG application.

```python theme={null}
from phoenix.session.evaluation import get_qa_with_reference

px_client = px.Client()  # Define phoenix client
queries_df = get_qa_with_reference(
    px_client
)  # Get question, answer and reference data from phoenix
```

Next, we enable concurrent evaluations for better performance.

```python theme={null}
import nest_asyncio

nest_asyncio.apply()  # needed for concurrent evals in notebook environments
```

Then, we define our evaluators and run the evaluations

```python theme={null}
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    run_evals,
)

eval_model = OpenAIModel(
    model="gpt-4o",
)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)

hallucination_eval_df, qa_correctness_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
```

Finally, we log the evaluations into Phoenix

```python theme={null}
from phoenix.trace import SpanEvaluations

px_client.log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(
        eval_name="QA_Correctness", dataframe=qa_correctness_eval_df
    ),
)
```

## Export data to Arize AX

### Get data into dataframes

We extract the spans and evals dataframes from the phoenix client

```python theme={null}
tds = px_client.get_trace_dataset()
spans_df = tds.get_spans_dataframe(include_evaluations=False)
spans_df.head()
```

```python theme={null}
evals_df = tds.get_evals_dataframe()
evals_df.head()
```

### Initialize Arize Client

```python theme={null}
# Note: This example uses Python SDK v7
from arize.pandas.logger import Client
```

Sign up/log in to your Arize AX account [here](https://app.arize.com/auth/login). Find your [space ID and API key](https://arize.com/docs/ax/api-reference/arize.pandas/client). Copy/paste into the cell below.

<Frame>
  ![](https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/cookbooks/image-7.png)
</Frame>

```python theme={null}
SPACE_ID = globals().get("SPACE_ID") or getpass(
    "🔑 Enter your Arize AX Space ID: "
)
API_KEY = globals().get("API_KEY") or getpass("🔑 Enter your Arize AX API Key: ")

arize_client = Client(
    space_id=SPACE_ID,
    api_key=API_KEY,
)
model_id = "tutorial-tracing-llama-index-rag-export-from-phoenix"
model_version = "1.0"
```

Lastly, we use `log_spans` from the arize client to log our spans data and, if we have evaluations, we can pass the optional `evals_dataframe`.

```python theme={null}
# Note: This example uses Python SDK v7
response = arize_client.log_spans(
    dataframe=spans_df,
    evals_dataframe=evals_df,
    model_id=model_id,
    model_version=model_version,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(
        f"❌ logging failed with response code {response.status_code}, {response.text}"
    )
else:
    print("✅ You have successfully logged traces set to Arize AX")
```
