Legacy Evaluator: This evaluator is from phoenix-evals 1.x and will be removed in a future version. You can migrate the template to a custom evaluator as shown below.
Example of a Question:
How many artists have names longer than 10 characters?
Example Query Generated:
SELECT COUNT(ArtistId) \nFROM artists \nWHERE LENGTH(Name) > 10
The goal of the SQL generation Evaluation is to determine if the SQL generated is correct based on the question asked.
Google Colab
colab.research.google.com
SQL Eval Template
You are tasked with determining if the SQL generated appropriately answers a given
instruction taking into account its generated query and response.
<data>
<instruction>
{question}
</instruction>
<reference_query>
{query_gen}
</reference_query>
<response>
{response}
</response>
</data>
Your response should be a single word: either "correct" or "incorrect".
You must assume that the db exists and that columns are appropriately named.
You must take into account the response as additional information to determine the
correctness.
"correct" indicates that the SQL query correctly solves the instruction.
"incorrect" indicates that the SQL query does not correctly solve the instruction.
Running an SQL Generation Eval
from phoenix.evals import ClassificationEvaluator
from phoenix.evals.llm import LLM
SQL_EVAL_TEMPLATE = """You are tasked with determining if the SQL generated appropriately answers a given
instruction taking into account its generated query and response.
<data>
<instruction>
{question}
</instruction>
<reference_query>
{query_gen}
</reference_query>
<response>
{response}
</response>
</data>
You must assume that the db exists and that columns are appropriately named.
You must take into account the response as additional information to determine the correctness.
"correct" means the SQL query correctly answers the instruction.
"incorrect" means the SQL query does not correctly answer the instruction."""
sql_evaluator = ClassificationEvaluator(
name="sql_generation",
prompt_template=SQL_EVAL_TEMPLATE,
model=LLM(provider="openai", model="gpt-4o"),
choices={"incorrect": 0, "correct": 1},
)
result = sql_evaluator.evaluate({
"question": "How many artists have names longer than 10 characters?",
"query_gen": "SELECT COUNT(ArtistId) FROM artists WHERE LENGTH(Name) > 10",
"response": "42"
})