phoenix llamaindex openai

How To Set Up a SQL Router Query Engine for Effective Text-To-SQL

amber roberts arize

Amber Roberts

Machine Learning Engineer

This article co-authored by Dustin Ngo

Large language model (LLM) applications are being deployed by an increasing number of companies to power everything from code generation to improved summarization of customer service calls.

One area where LLMs with in-context learning show promise is text-to-SQL, or generating SQL queries from natural language. Achieving results is often tricky in practice, given the difficulty in understanding what is happening under the hood with retrieval augmented generation (RAG) workflows.

LLM tracing is crucial to get the full picture of what is going on. This tutorial and accompanying notebook dive into an example of text-to-SQL tracing that uses LlamaIndex, a table of cameras, and a vector index built from a Wikipedia article – making a routing decision on whether or not to use SQL or embeddings.

Background

LlamaIndex provides high-level APIs that enable users to build powerful applications in a few lines of code. However, it can be challenging to understand what is going on under the hood and to pinpoint the cause of issues. Phoenix makes your LLM applications observable by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans` of execution based on latency, token count, or other evaluation metrics.

This tutorial covers how to:

  • Build a query engine that uses both a SQL retriever and a VectorStoreIndex using LlamaIndex
  • Record trace data in OpenInference tracing format using the global arize_phoenix handler
  • Observe how a more complex LlamaIndex application might perform retrieval

ℹ️ Running this tutorial requires an OpenAI API key.

Install Dependencies and Import Libraries

!pip install "arize-phoenix[evals]" "openai>=1" gcsfs nest-asyncio "llama-index>=0.10.3" "llama-index-core" "openinference-instrumentation-llama-index>=1.0.0" "llama-index-callbacks-arize-phoenix>=0.1.2" "llama-index-readers-wikipedia" "sqlalchemy" wikipedia
import os
from getpass import getpass

import openai
import pandas as pd
import phoenix as px
import wikipedia
from llama_index.core import Document, Settings, set_global_handler
from llama_index.core.indices import VectorStoreIndex
from llama_index.core.query_engine import NLSQLTableQueryEngine, RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool
from llama_index.core.utilities.sql_wrapper import SQLDatabase
from llama_index.llms.openai import OpenAI
from sqlalchemy import (
   create_engine,
   text,
)

pd.set_option("display.max_colwidth", 1000)

Launch Phoenix

You can run Phoenix in the background to collect trace data emitted by any LlamaIndex application that has been instrumented with the OpenInferenceTraceCallbackHandler. Phoenix supports LlamaIndex’s one-click observability which will automatically instrument your LlamaIndex application! You can consult our integration guide for a more detailed explanation of how to instrument your LlamaIndex application.

Launch Phoenix and follow the instructions in the cell output to open the Phoenix UI (the UI should be empty because we have yet to run the LlamaIndex application).

session = px.launch_app()

Enable Phoenix tracing within LlamaIndex by setting arize_phoenix as the global handler. This will mount Phoenix’s OpenInferenceTraceCallback as the global handler. Phoenix uses OpenInference traces. an open-source standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix.

set_global_handler(“arize_phoenix”)

Configure Your OpenAI API Key

Set your OpenAI API key if it is not already set as an environment variable.

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
   openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

Prepare Reference Data

First, we download a dataset that contains technical details of various digital cameras and convert it into an in-memory SQL database. This dataset is provided by Kaggle and more details can be found here.

camera_info = pd.read_parquet(
   "https://storage.googleapis.com/arize-phoenix-assets/datasets/structured/camera-info/cameras.parquet"
)
engine = create_engine("sqlite:///:memory:", future=True)
camera_info.to_sql("cameras", engine, index=False)

We can test out our in-memory database by running a sample query.

with engine.connect() as connection:
   result = connection.execute(text("SELECT * FROM cameras LIMIT 5")).all()

   for row in result:
       print(row)

Next, for more general queries about digital cameras, we’ll download the Wikipedia page on Digital Cameras using the wikipedia SDK. We will convert this document into a LlamaIndex VectorStoreIndex.

# load the Digital Camera wikipedia page
page = wikipedia.page(pageid=52797)
doc = Document(id_=page.pageid, text=page.content)


vector_indices = []
vector_index = VectorStoreIndex.from_documents([doc])
vector_indices.append(vector_index)

How To Build the LlamaIndex Application

Let’s use a simple RouterQueryEngine using multiple query engine tools. We will either route to the SQL retriever or the vector index built over the “Digital Camera” Wikipedia page.

Settings.llm = OpenAI(temperature=0.0, model=”gpt-4″)

sql_database = SQLDatabase(engine, include_tables=["cameras"])

sql_query_engine = NLSQLTableQueryEngine(
   sql_database=sql_database,
   tables=["cameras"],
)
sql_tool = QueryEngineTool.from_defaults(
   query_engine=sql_query_engine,
   description=(
       "Useful for translating a natural language query into a SQL query over"
       " a table containing technical details about specific digital camera models: Model,"
       " Release date, Max resolution, Low resolution, Effective pixels, Zoom wide (W),"
       " Zoom tele (T), Normal focus range, Macro focus range, Storage included,"
       " Weight (inc. batteries), Dimensions, Price"
   ),
)

vector_query_engines = [index.as_query_engine() for index in vector_indices]
vector_tools = []
for query_engine in vector_query_engines:
   vector_tool = QueryEngineTool.from_defaults(
       query_engine=query_engine,
       description="Useful for answering generic questions about digital cameras.",
   )
   vector_tools.append(vector_tool)

By defining two different tools, we’re going to build a LlamaIndex query engine that can use an LLM to choose to either generate a custom SQL query to answer a question or pull from a vector store index to retrieve context, like a basic RAG use case.

query_engine = RouterQueryEngine(
   selector=LLMSingleSelector.from_defaults(),
   query_engine_tools=([sql_tool] + vector_tools),
)

Make Queries and Use Phoenix to View Spans

response = query_engine.query("What is the most expensive digital camera?")
print(str(response))

Questions that ask for specific details about a camera will be routed to the SQL retriever to get context for the response. The LLM-generated SQL can be seen in a Phoenix span! This SQL is then parsed and run against our database to produce an answer to the question.

SQL retrieval query engine tutorial view spans

response = query_engine.query("Tell me about the history of digital camera sensors.")
print(str(response))

More general queries are routed to the vector index and will use Wikipedia to help answer the question.

query engine vector store index retrieval view span

Conclusion

While text-to-SQL shows a lot of promise as a use case to retrieve context from a structured dataset, the implementation can lead to inconsistent results. The SQL tool description heavily influences how reliably the router picks the right tool to use and whether or not the LLM produces valid SQL.

Under the hood, a lot can go wrong and LLM tracing and LLM observability are crucial to finding failure points and acting on them quickly.

LLM Traces and the accompanying OpenInference Tracing specification are designed to be a category of telemetry data that is used to understand the execution of LLMs and the surrounding application context. This is especially useful when understanding the behavior of more complex RAG applications that might make use of multiple context retrieval strategies, allowing us to quickly identify bad responses and drill down to the part of the RAG application that didn’t behave as expected.

For more details on Phoenix, LLM tracing, and LLM evals, check out our documentation.