Delayed Actuals and Tags

Connect model predictions to delayed ground truth data

What Are Delayed (Latent) Actuals

Depending on your model use case, you may experience a delayed feedback loop when collecting ground truth data. We call this data delayed actuals.

If your model receives delayed actuals, Arize can automatically connect actuals to predictions sent earlier via the same prediction ID.

Sending Delayed Actuals

Utilize the Arize joiner to easily match delayed actuals with predictions in the Arize platform. To do this, simply upload your actuals data using the same prediction_id as its corresponding prediction.

Joiner Cadence & Lookback

The Arize joiner automatically triggers daily at 05:00 UTC to map delayed actuals with their corresponding prediction values up to 14 days from when the prediction was received. This is supported for all data upload methods.

Joins are conducted on actuals sent within the join window for the day prior, which is from 00:00 UTC to 23:59 UTC.

The Arize support team can extend your 14-day connection window and increase your joiner cadence upon request. Reach out to support@arize.com for help.

Joiner Requirements

Field

Description

prediction_id

(required) A prediction's unique identifier. The actual's prediction_id must match its corresponding prediction to join the data

actual_score / actual_label For ranking models only: relevance_label

(required) The ground truth values of your model. The use of score and label varies based on model type

model_id

(required) When sending delayed actuals, specify the model_id in your schema to match your actuals to the correct model

Upload delayed actuals for ranking models with file/table upload via GraphQL or SDK. Native UI upload support coming soon. Reach out to support@arize.com for help and questions.

Example Joins By Upload Method

To send delayed actuals via GCS, AWS S3, Azure Blob Storage, Google BigQuery, and Snowflake, configure separate data ingestion jobs for predictions and actuals. We recommend naming job prefixes to indicate which job contains predictions or actuals.

gs://bucket1/click-thru-rate/prediction/
├── 11-19-2022.parquet 
├── 11-20-2022.parquet
├── 11-21-2022.parquet

gs://bucket1/click-thru-rate/actuals/
├── 12-1-2022.parquet # same prediction id column, model, and space as the corresponding prediction
├── 12-2-2022.parquet
└── 12-3-2022.parquet

Make sure that your prediction ID, model name, and space match with your corresponding predictions when defining the schema for these two data ingestion jobs. Once you configure both jobs, Arize will automatically recognize and sync new prediction and actual data. To validate new data in Arize, visualize the data in the 'Dataset' tab.

To log delayed actuals using the Python SDK, simply match the actuals prediction_id_column_name with its corresponding prediction. From there, Arize will automatically identify the join and match the data together.

#log predictions
schema = Schema(
    prediction_id_column_name="prediction_id", 
    prediction_label_column_name="prediction_label",
    ...
)
# then log actuals 
schema = Schema(
    prediction_id_column_name="prediction_id", #needs to be the same as above
    actual_label_column_name="actual_label",
    ...
)

To log delayed actuals using the Python Single Record SDK, simply match the actual prediction_id with its corresponding prediction. From there, Arize will automatically identify the join and match the data together.

#log the features & prediction
response = arize.log(
    prediction_id='plED4eERDCasd9797ca34',
    model_id='sample-model-1',
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    model_version='v1',
    prediction_timestamp=1618590882,
    features=features,
    prediction_label=('Fraud',.4),
    tags=tags
)

#log the actual
actual_response = arize.log(
    prediction_id='plED4eERDCasd9797ca34',
    model_id='sample-model-1',
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    actual_label=('Fraud',1),
    tags=tags)

Updating Previously Sent Actuals

You can update previously sent actuals for existing predictions, as long as the corresponding prediction_id is within the join window.

For example, if a user sends Arize a prediction first:

"prediction_id": "1a2b"
"prediction_score": 0.78

Then sends an initial ground truth label:

"prediction_id": "1a2b"
"actual_label": "fraud"

Then sends an updated ground truth label (e.g., if the ground truth label can change over time):

"prediction_id": "1a2b"
"actual_label": "not_fraud"

The end result of the prediction will be:

"prediction_id": "1a2b"
"prediction_score": 0.78
"actual_label": "not_fraud"

Delayed Tags

In addition to delayed actuals, you can also send delayed tags to Arize. You can send delayed tags with delayed actuals or separately on their own. If you're sending with delayed actuals, your data should include a prediction_id, your delayed actuals, and your delayed tag columns. If you're sending just delayed tags, it should include a prediction_id and your delayed tag columns.

schema = Schema(
    prediction_id_column_name="prediction_id", 
    actual_label_column_name="actual_label",
    tag_column_names=["tag1", "tag2"]
)

schema = Schema(
    prediction_id_column_name="prediction_id", 
    tag_column_names=["tag1", "tag2"]
)

These delayed tags can either be new tags for a prediction (i.e., the original tag value was not sent with the prediction), or they can be updated values for existing tags.

For example, if a user sends Arize a prediction with these tags:

"location": "New York"
"month": "January"

Next, they send one new tag and one update to an existing tag:

"location": "San Francisco"
"platform": "web"

The final prediction will have the following tag values:

"location": "San Francisco"
"month": "January"
"platform": "web"

Measure Model Performance

Arize only calculates performance metrics on predictions that have actuals, so once your join is represented in Arize, you can utilize performance metrics and the 'Performance Tracing' tab for those predictions.

If actuals have not been received yet (delayed actuals), use drift as a proxy metric for model performance to measure and monitor model health.

Last updated 2 months ago

Was this helpful?