Cost Tracking

Cost tracking is essential for understanding and managing the spend of your LLM-powered applications. Arize provides a flexible, powerful, and easy-to-configure system to track model usage costs across providers and model variants — whether you're using default pricing or defining custom rates.

Overview

Arize’s cost tracking enables you to:

  • Monitor LLM spend across use cases, models, and providers.

  • Track granular costs by token type (e.g., prompt, completion, audio, cache).

  • Customize pricing to reflect discounted rates, fine-tuned models, or non-standard providers.

We currently support 63 default cost configurations for common model and provider combinations, making it easy to get started with no setup required in many cases.

If there is a default you'd like us to add reach out to support@arize.com

How Cost Tracking Works

Cost tracking works by ingesting token usage metrics and applying the correct cost configuration based on the model and provider.

Lookup Logic

To determine cost:

  1. We extract the model name from your trace using the following fallback order:

    • llm.model_name (Primary)

    • llm.invocation_parameters.model (Fallback 1)

    • metadata.model (Fallback 2)

  2. Optionally, if you provide a provider, we’ll match that as well (e.g., differentiating OpenAI vs Azure OpenAI for gpt-4).

  3. Each token type (e.g., prompt, completion, audio) is matched against the configuration, and the cost is calculated per million tokens (1M token unit basis).

Configuring Cost Models

There are three ways to set up your cost configuration:

1. Use a Default (Zero Setup)

If your model and provider match a default, Arize automatically applies the correct pricing — no action needed.

2. Customize a Default

Want to tweak an existing config (e.g., apply discounts)? Just:

  • Go to Cost Tracking > Configuration

  • Click Options > Clone on a default config

  • Edit fields like token type cost or provider name

3. Create from Scratch

To define your own model config:

  • Click Add New

  • Enter the model name (required)

  • Optionally enter the provider

  • Specify cost per 1 million tokens for each token type

  • Assign each token type to Prompt or Completion

Regex Matching

By default, Arize uses ^MODELNAME$ to match. You can modify this pattern for broader or partial matches.

Cost configs are saved at the organization level

Supported Token Types and Semantic Conventions

You can send any token types using OpenInference semantic conventions. Below are the supported fields:

Prompt Tokens

Token Type
Field Name

Prompt (Includes all input subtypes to LLM)

llm.token_count.prompt

Prompt Details

llm.token_count.prompt_details

Audio

llm.token_count.prompt_details.audio

Image

llm.token_count.prompt_details.image

Cache Input

llm.token_count.prompt_details.cache_input

Cache Read

llm.token_count.prompt_details.cache_read

Cache Write

llm.token_count.prompt_details.cache_write

Completion Tokens

Token Type
Field Name

Completion (Includes all output subtypes from LLM)

llm.token_count.completion

Audio

llm.token_count.completion_details.audio

Reasoning

llm.token_count.completion_details.reasoning

Image

llm.token_count.completion_details.image

Total Tokens (Optional)

  • llm.token_count.total

Custom Token Types

You can also define custom token types under either prompt_details or completion_details. Just make sure to:

  • Use semantic naming

  • Include a matching token type and cost in your configuration

Each token sent will have a cost calculated provided a matching token type is defined in your configuration.

Using Cost

Once cost tracking is enabled, you can analyze and explore cost data at various levels of detail within the Arize UI to better understand and manage your LLM application spend.

Filtering and Monitoring

All cost attributes are available throughout the platform and can be used to:

  • Filter traces or spans where cost exceeds a defined threshold

  • Create monitors for high-cost traces or model behavior anomalies

  • Build dashboards based on specific token types or cost groupings (e.g., prompt, audio, image)

Trace-Level Visualization

At the trace level, Arize aggregates cost across all LLM spans in the trace. This provides a complete view of how much it cost to serve a given request end-to-end, which is especially useful for understanding the impact of complex flows or multi-stage prompts.

Span-Level Visualization

You can also inspect cost at the individual span level, including a breakdown by token type. This allows you to:

  • Pinpoint expensive steps in the LLM pipeline

  • Analyze the relative contribution of different token categories (e.g., reasoning, cache, image)

Last updated

Was this helpful?