Log experiment

Some teams have complex experiment pipelines and might need to run experiments remotely. Teams can still log those experiment results to Arize via log_experiment to maintain a record of experiments for tracking and comparing.

Steps to log an experiment

1. Store the experiment results in a dataframe

We will be logging an example experiment with three columns:

  • result is the output of the LLM pipeline.

  • correctness is the evaluation label of the experiment.

  • example_id is the dataset row ID, which is needed to map the results to the specific dataset row with inputs and expected outputs.

# Example DataFrame:
experiment_run_df = pd.DataFrame(
    {
        "result": [
            "The telephone was invented by **Alexander Graham Bell**.", 
            "The invention of the light bulb is commonly attributed to **Thomas Edison**"
        ],
        "label": ["correct", "incorrect"],
        "score": [1, 0],
        "explanation_text": [
            "This statement is accurate because Alexander Graham Bell is credited with inventing the telephone.",
            "This statement is inaccurate; others like Humphry Davy and Joseph Swan made earlier versions of the light bulb.",
        ],
    }
)

2. Define column mappings

This code sets up mappings that link each dataset example to example_id, the LLM output to result, and evaluator outputs to label, score, and explanation.

from arize.experimental.datasets.experiments.types import (
    ExperimentTaskResultColumnNames,
    EvaluationResultColumnNames,
)

# Define column mappings for the LLM task id and example output
task_cols = ExperimentTaskResultColumnNames(
    example_id="example_id", result="result"
)

# Define column mappings for evaluator
evaluator_cols = EvaluationResultColumnNames(
    label="label",
    score="score",
    explanation="explanation_text",
)

# This maps the dataset ID to the example_id
# So,the first example uses result = "The telephone was invented by **Alexander Graham Bell**."
experiment_run_df["example_id"] = dataset["id"] 

3. Log the experiment

Log the experiment to Arize using the columns and label for correctness.

arize_client.log_experiment(
    space_id=ARIZE_SPACE_ID,
    experiment_name="my_experiment",
    experiment_df=experiment_run_df,
    task_columns=task_cols,
    evaluator_columns={"correctness": evaluator_cols},
    dataset_name="inventions-dataset",
)

Last updated

Was this helpful?