Log experiment
Some teams have complex experiment pipelines and might need to run experiments remotely. Teams can still log those experiment results to Arize via log_experiment
to maintain a record of experiments for tracking and comparing.
Steps to log an experiment
1. Store the experiment results in a dataframe
We will be logging an example experiment with three columns:
result
is the output of the LLM pipeline.correctness
is the evaluation label of the experiment.example_id
is the dataset row ID, which is needed to map the results to the specific dataset row with inputs and expected outputs.
# Example DataFrame:
experiment_run_df = pd.DataFrame(
{
"result": [
"The telephone was invented by **Alexander Graham Bell**.",
"The invention of the light bulb is commonly attributed to **Thomas Edison**"
],
"label": ["correct", "incorrect"],
"score": [1, 0],
"explanation_text": [
"This statement is accurate because Alexander Graham Bell is credited with inventing the telephone.",
"This statement is inaccurate; others like Humphry Davy and Joseph Swan made earlier versions of the light bulb.",
],
}
)
2. Define column mappings
This code sets up mappings that link each dataset example to example_id
, the LLM output to result
, and evaluator outputs to label
, score
, and explanation
.
from arize.experimental.datasets.experiments.types import (
ExperimentTaskResultColumnNames,
EvaluationResultColumnNames,
)
# Define column mappings for the LLM task id and example output
task_cols = ExperimentTaskResultColumnNames(
example_id="example_id", result="result"
)
# Define column mappings for evaluator
evaluator_cols = EvaluationResultColumnNames(
label="label",
score="score",
explanation="explanation_text",
)
# This maps the dataset ID to the example_id
# So,the first example uses result = "The telephone was invented by **Alexander Graham Bell**."
experiment_run_df["example_id"] = dataset["id"]
3. Log the experiment
Log the experiment to Arize using the columns and label for correctness.
arize_client.log_experiment(
space_id=ARIZE_SPACE_ID,
experiment_name="my_experiment",
experiment_df=experiment_run_df,
task_columns=task_cols,
evaluator_columns={"correctness": evaluator_cols},
dataset_name="inventions-dataset",
)
Last updated
Was this helpful?