1 of 2

Phoenix Inferences

Observability for all model types (LLM, NLP, CV, Tabular)

Overview

Phoenix Inferences allows you to observe the performance of your model through visualizing all the model’s inferences in one interactive UMAP view.

This powerful visualization can be leveraged during EDA to understand model drift, find low performing clusters, uncover retrieval issues, and export data for retraining / fine tuning.

Quickstart

The following Quickstart can be executed in a Jupyter notebook or Google Colab.

We will begin by logging just a training set. Then proceed to add a production set for comparison.

Step 1: Install and load dependencies

Use pip or condato install arize-phoenix.

!pip install arize-phoenix

import phoenix as px

Step 2: Prepare model data

Phoenix visualizes data taken from pandas dataframe, where each row of the dataframe compasses all the information about each inference (including feature values, prediction, metadata, etc.)

For this Quickstart, we will show an example of visualizing the inferences from a computer vision model. See example notebooks for all model types here.

Let’s begin by working with the training set for this model.

Download the dataset and load it into a Pandas dataframe.

import pandas as pd

train_df = pd.read_parquet(
    "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet"
)

Preview the dataframe with train_df.head() and note that each row contains all the data specific to this CV model for each inference.

train_df.head()

Step 3: Define dataset Schema

Before we can log this dataset, we need to define a Schema object to describe this dataset.

The Schema object informs Phoenix of the fields that the columns of the dataframe should map to.

Here we define a Schema to describe our particular CV training set:

# Define Schema to indicate which columns in train_df should map to each field
train_schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_action",
    actual_label_column_name="actual_action",
    embedding_feature_column_names={
        "image_embedding": px.EmbeddingColumnNames(
            vector_column_name="image_vector",
            link_to_data_column_name="url",
        ),
    },
)

Important: The fields used in a Schema will vary depending on the model type that you are working with.

For examples on how Schema are defined for other model types (NLP, tabular, LLM-based applications), see example notebooks under Embedding Analysis and Structured Data Analysis.

Step 4: Wrap into Dataset object

Wrap your train_df and schema train_schema into a Phoenix Dataset object:

train_ds = px.Dataset(dataframe=train_ds, schema=train_schema, name="training")

Step 5: Launch Phoenix!

We are now ready to launch Phoenix with our Dataset!

Here, we are passing train_ds as the primary dataset, as we are only visualizing one dataset (see Step 6 for adding additional datasets).

session = px.launch_app(primary=train_ds)

Running this will fire up a Phoenix visualization. Follow in the instructions in the output to view Phoenix in a browser, or in-line in your notebook:

🌍 To view the Phoenix app in your browser, visit https://x0u0hsyy843-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

You are now ready to observe the training set of your model!

✅ Checkpoint A.

Optional - try the following exercises to familiarize yourself more with Phoenix:

Click on image_embedding under the Embeddings section to enter the UMAP projector view
Select a point where the model accuracy is <0.78, and see the embedding visualization below update to include only points from this selected timeframe
Select the cluster with the lowest accuracy; from the list of automatic clusters generated by Phoenix
- Note that Phoenix automatically generates clusters for you on your data using a clustering algorithm called HDBSCAN (more information: https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters)
Change the colorization of your plot - e.g. select Color By ‘correctness’, and ‘dimension'
Describe in words an insight you've gathered from this visualization

Discuss your answers in our community!

Step 6 (Optional): Add a comparison dataset

In order to visualize drift, conduct A/B model comparisons, or in the case of an information retrieval use case, compare inferences against a corpus, you will need to add a comparison dataset to your visualization.

We will continue on with our CV model example above, and add a set of production data from our model to our visualization.

This will allow us to analyze drift and conduct A/B comparisons of our production data against our training set.

a) Prepare production dataset

prod_df = pd.read_parquet(
    "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet"
)

prod_df.head()

b) Define model schema

Note that this schema differs slightly from our train_schema above, as our prod_df does not have a ground truth column!

prod_schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_action",
    embedding_feature_column_names={
        "image_embedding": px.EmbeddingColumnNames(
            vector_column_name="image_vector",
            link_to_data_column_name="url",
        ),
    },
)

When do I need a different schema?

In general, if both datasets you are visualizing have identical schemas, you can reuse the Schema object.

However, there are often differences between the schema of a primary and reference dataset. For example:

Your production set does not include any ground truth, but your training set does.
Your primary dataset is the set of prompt-responses in an LLM application, and your reference is your corpus.
Your production data has differing timestamps between all inferences, but your training set does not have a timestamp column.

Read more about comparison dataset Schemas here: How many schemas do I need?

c) Wrap into Dataset object

prod_ds = px.Dataset(dataframe=prod_df, schema=schema, name="production")

d) Launch Phoenix with both Datasets!

This time, we will include both train_ds and prod_ds when calling launch_app.

session = px.launch_app(primary=prod_ds, reference=train_ds)

Which dataset should I set as `reference` and as `primary`? Select the dataset that you want to use as the referential baseline as your reference, and the dataset you'd like to actively evaluate as your primary.

In this case, training is our referential baseline, for which we want to gauge the behavior (e.g. evaluate drift) of our production data against.

For more information, see Which dataset is which?

Once again, enter your Phoenix app with the new link generated by your session. e.g.

🌍 To view the Phoenix app in your browser, visit https://x0u0hsyy845-496ff2e9c6d22116-6060-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

You are now ready to conduct comparative Root Cause Analysis!

✅ Checkpoint B.

Optional - try the following exercises to familiarize yourself more with Phoenix:

Click into image_embedding under the Embeddings listing to enter the UMAP projector
Select a point on the time series where there is high drift (hint: as given by Euclidean Distance), and see the datapoints from the time selection being rendered below
While colorizing the data by 'Dataset', select the datapoints with the lasso tool where there exists only production data (hint: this is a set of data that has emerged in prod, and is a cause for the increase in drift!)
Export the selected cluster from Phoenix
Describe in words the process you went through to understand increased drift in your production data

Discuss your answers in our community!

Step 7 (Optional): Export data

Once you have identified datapoints of interest, you can export this data directly from the Phoenix app for further analysis, or to incorporate these into downstream model retraining and finetuning flows.

See more on exporting data: https://docs.arize.com/phoenix/~/changes/v6Zhm276x8LlKmwqElIA/how-to/export-your-data#exporting-embeddings

Step 8 (Optional): Enable production observability with Arize

Once your model is ready for production, you can add Arize to enable production-grade observability. Phoenix works in conjunction with Arize to enable end-to-end model development and observability.

With Arize, you will additionally benefit from:

Being able to publish and observe your models in real-time as inferences are being served, and/or via direct connectors from your table/storage solution
Scalable compute to handle billions of predictions
Ability to set up monitors & alerts
Production-grade observability
Integration with Phoenix for model iteration to observability
Enterprise-grade RBAC and SSO
Experiment with infinite permutations of model versions and filters

Create your free account and see the full suite of Arize features.

Where to go from here?

Read more about Embeddings Analysis: https://docs.arize.com/phoenix/~/changes/v6Zhm276x8LlKmwqElIA/concepts/embeddings-analysis

Questions?

Join the Phoenix Slack community to ask questions, share findings, provide feedback, and connect with other developers.

Schemas and Datasets

Learn the foundational concepts of the Phoenix API and Application

This section introduces datasets and schemas, the starting concepts needed to use Phoenix.

For comprehensive descriptions of phoenix.Dataset and phoenix.Schema, see the API reference.
For tips on creating your own Phoenix datasets and schemas, see the how-to guide.

Datasets

A Phoenix dataset is an instance of phoenix.Dataset that contains three pieces of information:

The data itself (a pandas dataframe)
A schema (a phoenix.Schema instance) that describes the columns of your dataframe
A dataset name that appears in the UI

For example, if you have a dataframe prod_df that is described by a schema prod_schema, you can define a dataset prod_ds with

prod_ds = px.Dataset(prod_df, prod_schema, "production")

If you launch Phoenix with this dataset, you will see a dataset named "production" in the UI.

How many datasets do I need?

You can launch Phoenix with zero, one, or two datasets.

With no datasets, Phoenix runs in the background and collects trace data emitted by your instrumented LLM application. With a single dataset, Phoenix provides insights into model performance and data quality. With two datasets, Phoenix compares your datasets and gives insights into drift in addition to model performance and data quality, or helps you debug your retrieval-augmented generation applications.

Which dataset is which?

Your reference dataset provides a baseline against which to compare your primary dataset.

To compare two datasets with Phoenix, you must select one dataset as primary and one to serve as a reference. As the name suggests, your primary dataset contains the data you care about most, perhaps because your model's performance on this data directly affects your customers or users. Your reference dataset, in contrast, is usually of secondary importance and serves as a baseline against which to compare your primary dataset.

Very often, your primary dataset will contain production data and your reference dataset will contain training data. However, that's not always the case; you can imagine a scenario where you want to check your test set for drift relative to your training data, or use your test set as a baseline against which to compare your production data. When choosing primary and reference datasets, it matters less where your data comes from than how important the data is and what role the data serves relative to your other data.

Corpus Dataset (Information Retrieval)

The only difference for the corpus dataset is that it needs a separate schema because it have a different set of columns compared to the model data. See the schema section for more details.

Schemas

A Phoenix schema is an instance of phoenix.Schema that maps the columns of your dataframe to fields that Phoenix expects and understands. Use your schema to tell Phoenix what the data in your dataframe means.

For example, if you have a dataframe containing Fisher's Iris data that looks like this:

sepal_length

sepal_width

petal_length

petal_width

target

prediction

7.7

3.0

6.1

2.3

virginica

versicolor

5.4

3.9

1.7

0.4

setosa

6.3

3.3

4.7

1.6

versicolor

6.2

3.4

5.4

2.3

virginica

setosa

5.8

2.7

5.1

1.9

virginica

your schema might look like this:

schema = px.Schema(
    feature_column_names=[
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
    ],
    actual_label_column_name="target",
    prediction_label_column_name="prediction",
)

How many schemas do I need?

Usually one, sometimes two.

Each dataset needs a schema. If your primary and reference datasets have the same format, then you only need one schema. For example, if you have dataframes train_df and prod_df that share an identical format described by a schema named schema, then you can define datasets train_ds and prod_ds with

train_ds = px.Dataset(train_df, schema, "training")
prod_ds = px.Dataset(prod_df, schema, "production")

Sometimes, you'll encounter scenarios where the formats of your primary and reference datasets differ. For example, you'll need two schemas if:

Your production data has timestamps indicating the time at which an inference was made, but your training data does not.
Your training data has ground truth (what we call actuals in Phoenix nomenclature), but your production data does not.
A new version of your model has a differing set of features from a previous version.

In cases like these, you'll need to define two schemas, one for each dataset. For example, if you have dataframes train_df and prod_df that are described by schemas train_schema and prod_schema, respectively, then you can define datasets train_ds and prod_ds with

train_ds = px.Dataset(train_df, train_schema, "training")
prod_ds = px.Dataset(prod_df, prod_schema, "production")

Schema for Corpus Dataset (Information Retrieval)

A corpus dataset, containing documents for information retrieval, typically has a different set of columns than those found in the model data from either production or training, and requires a separate schema. Below is an example schema for a corpus dataset with three columns: the id, text, and embedding for each document in the corpus.

corpus_schema=Schema(
    id_column_name="id",
    document_column_names=EmbeddingColumnNames(
        vector_column_name="embedding",
        raw_data_column_name="text",
    ),
),
corpus_ds = px.Dataset(corpus_df, corpus_schema)

Application

Phoenix runs as an application that can be viewed in a web browser tab or within your notebook as a cell. To launch the app, simply pass one or more datasets into the launch_app function:

session = px.launch_app(prod_ds, train_ds)
# or just one dataset
session = px.launch_app(prod_ds)
# or with a corpus dataset
session = px.launch_app(prod_ds, corpus=corpus_ds)

The application provide you with a landing page that is populated with your model's schema (e.g. the features, tags, predictions, and actuals). This gives you a statistical overview of your data as well as links into the embeddings details views for analysis.

Schemas and Datasets

Learn the foundational concepts of the Phoenix API and Application

This section introduces datasets and schemas, the starting concepts needed to use Phoenix.

For comprehensive descriptions of phoenix.Dataset and phoenix.Schema, see the API reference.
For tips on creating your own Phoenix datasets and schemas, see the how-to guide.

Datasets

A Phoenix dataset is an instance of phoenix.Dataset that contains three pieces of information:

The data itself (a pandas dataframe)
A schema (a phoenix.Schema instance) that describes the columns of your dataframe
A dataset name that appears in the UI

For example, if you have a dataframe prod_df that is described by a schema prod_schema, you can define a dataset prod_ds with

prod_ds = px.Dataset(prod_df, prod_schema, "production")

If you launch Phoenix with this dataset, you will see a dataset named "production" in the UI.

How many datasets do I need?

You can launch Phoenix with zero, one, or two datasets.

Which dataset is which?

Your reference dataset provides a baseline against which to compare your primary dataset.

Corpus Dataset (Information Retrieval)

The only difference for the corpus dataset is that it needs a separate schema because it have a different set of columns compared to the model data. See the schema section for more details.

Schemas

For example, if you have a dataframe containing Fisher's Iris data that looks like this:

sepal_length

sepal_width

petal_length

petal_width

target

prediction

7.7

3.0

6.1

2.3

virginica

versicolor

5.4

3.9

1.7

0.4

setosa

6.3

3.3

4.7

1.6

versicolor

6.2

3.4

5.4

2.3

virginica

setosa

5.8

2.7

5.1

1.9

virginica

your schema might look like this:

schema = px.Schema(
    feature_column_names=[
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
    ],
    actual_label_column_name="target",
    prediction_label_column_name="prediction",
)

How many schemas do I need?

Usually one, sometimes two.

train_ds = px.Dataset(train_df, schema, "training")
prod_ds = px.Dataset(prod_df, schema, "production")

Sometimes, you'll encounter scenarios where the formats of your primary and reference datasets differ. For example, you'll need two schemas if:

Your production data has timestamps indicating the time at which an inference was made, but your training data does not.
Your training data has ground truth (what we call actuals in Phoenix nomenclature), but your production data does not.
A new version of your model has a differing set of features from a previous version.

train_ds = px.Dataset(train_df, train_schema, "training")
prod_ds = px.Dataset(prod_df, prod_schema, "production")

Schema for Corpus Dataset (Information Retrieval)

corpus_schema=Schema(
    id_column_name="id",
    document_column_names=EmbeddingColumnNames(
        vector_column_name="embedding",
        raw_data_column_name="text",
    ),
),
corpus_ds = px.Dataset(corpus_df, corpus_schema)

Application

Phoenix runs as an application that can be viewed in a web browser tab or within your notebook as a cell. To launch the app, simply pass one or more datasets into the launch_app function:

session = px.launch_app(prod_ds, train_ds)
# or just one dataset
session = px.launch_app(prod_ds)
# or with a corpus dataset
session = px.launch_app(prod_ds, corpus=corpus_ds)