You can now run Experiments using the Phoenix JS client! Use Experiments to test different iterations of your applications over a set of test cases, then evaluate the results.
This release includes:
Native tracing of tasks and evaluators
Async concurrency queues
Support for any evaluator (including bring your own evals)
import { createClient } from "@arizeal/phoenix-client";
import {
asEvaluator,
runExperiment,
} from "@arizeal/phoenix-client/experiments";
import type { Example } from "@arizeal/phoenix-client/types/datasets";
import { Factuality } from "autoevals";
import OpenAI from "openai";
const phoenix = createClient();
const openai = new OpenAI();
/** Your AI Task */
const task = async (example: Example) => {
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: JSON.stringify(example.input, null, 2) },
],
});
return response.choices[0]?.message?.content ?? "No response";
};
await runExperiment({
dataset: "dataset_id",
experimentName: "experiment_name",
client: phoenix,
task,
evaluators: [
asEvaluator({
name: "Factuality",
kind: "LLM",
evaluate: async (params) => {
const result = await Factuality({
output: JSON.stringify(params.output, null, 2),
input: JSON.stringify(params.input, null, 2),
expected: JSON.stringify(params.expected, null, 2),
});
return {
score: result.score,
label: result.name,
explanation: (result.metadata?.rationale as string) ?? "",
metadata: result.metadata ?? {},
};
},
}),
],
});
Available in Phoenix 9.0.0+
Phoenix v9.0.0 release brings major updates to annotation support, and a whole host of other improvements.
Up until now, Phoenix has only supported one annotation of a given type on each trace. We've now unlocked that limit, allowing you to capture multiple values of an annotation label on each span.
In addition, we've added:
API support for annotations - create, query, and update annotations through the REST API
Additional support for code evaluations as annotations
Support for arbitrary metadata on annotations
Annotation configurations to structure your annotations within and across projects
Now you can create custom global and per-project data retention polices to remove traces after a certain window of time, or based on number of traces. Additionally, you can now view your disk usage in the Settings page of Phoenix.
We've added hotkeys to Phoenix!
You can now use j
and k
to quickly page through your traces, and e
and n
to add annotations and notes - you never have to lift your hands off the keyboard again!
Available in Phoenix 10.5+
Phoenix v10.5.0 now supports Deepseek and xAI models in Playground natively. Previous versions of Phoenix supported these as custom model endpoints, but that process has now been streamlined to offer these model providers from the main Playground dropdown.
We’ve added a Python auto-instrumentation library for the Google GenAI SDK. This enables seamless tracing of GenAI workflows with full OpenTelemetry compatibility. Traces can be exported to any OpenTelemetry collector.
pip install openinference-instrumentation-google-genai
For more details on how to set up the tracing integration seamlessly:
Additionally, the Google GenAI instrumentor is now supported and works seamlessly with Span Replay in Phoenix, enabling deep trace inspection and replay for more effective debugging and observability.
Big thanks to Harrison Chu for his contributions.
We've added a host of new methods to the JS client:
getExperiment - allows you to retrieve an Experiment to view its results, and run evaluations on it
evaluateExperiment - allows you to evaluate previously run Experiments using LLM as a Judge or Code-based evaluators
createDataset - allows you to create Datasets in Phoenix using the client
appendDatasetExamples - allows you to append additional examples to a Dataset