Log experiment results via SDK

Logging experiments to Arize when you already have all the data

A notebook showing how to log LLM outputs and evaluation results to Arize

If you already have all the data for an experiment, which you want to log to the Arize UI without creating a task or re-running it, you can use our log_experiment function (SDK definition).

You map the columns of your LLM output and your evaluators to your dataset, and then you can log the results to track your experiments in the Arize UI.

Create a dataset

We'll use a sample dataframe here of an LLM that solves math problems, with input and expected_output for these math equations below.

# Set up the arize client
arize_client = ArizeDatasetsClient(api_key=API_KEY)

dataset_df = pd.DataFrame(
    {"input": ["1+1", "1+2"], "expected_output": ["2", "3"]}
)

dataset_name = "log-experiments-example"

dataset_id = arize_client.create_dataset(
    space_id=SPACE_ID,
    dataset_name=dataset_name,
    dataset_type=GENERATIVE,
    data=dataset_df,
)
dataset = arize_client.get_dataset(space_id=SPACE_ID, dataset_id=dataset_id)

Log experiment

We will be logging an experiment with three columns:

  • example_id is the dataset row ID, which is needed to map the results to the specific dataset row with inputs and expected outputs.

  • result is the output of the LLM pipeline.

  • correctness is the evaluation label of the experiment.

# Example dataframe with experiment results and evaluations:
experiment_run_df = pd.DataFrame(
    {
        "result": ["2", "4"],
        "label": ["correct", "incorrect"],
        "score": [1, 0],
        "explanation_text": [
            "1+1 added is 2, which is correct",
            "1+2 added is 4, which is incorrect",
        ],
    }
)

# This maps the dataset ID to the example_id, so the first example uses result = 2
# The second example uses result = 4.
experiment_run_df["example_id"] = dataset["id"]

# Define column mappings for the LLM task id and example output
task_cols = ExperimentTaskResultColumnNames(
    example_id="example_id", result="result"
)

# Define column mappings for evaluator from the dataframe above
evaluator_cols = EvaluationResultColumnNames(
    label="label",
    score="score",
    explanation="explanation_text",
)

# Log the experiment to arize using the columns and label for correctness.
arize_client.log_experiment(
    space_id=SPACE_ID,
    experiment_name="my_experiment",
    experiment_df=experiment_run_df,
    task_columns=task_cols,
    evaluator_columns={"correctness": evaluator_cols},
    dataset_name=dataset_name,
)

After running log_experiment, you should see the image above in the experiments UI!

Last updated

Was this helpful?