Documentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Know what every request costs. A single agent might chain five LLM calls — without cost tracking, you’re guessing what you’re spending. Arize AX calculates cost for every span, aggregates it at the trace level, and lets you filter, monitor, and optimize from there.
What is Cost Tracking?
Arize AX calculates the cost of every LLM call in your traces — at the trace level (total cost of a request) and at the span level (cost of each individual LLM call). Use it to:
- Spot which requests or agents are expensive and why
- Track spend across models and providers over time
- Catch cost spikes before they become budget problems
- Compare cost/quality tradeoffs between different models
Arize includes default cost configurations for common models (GPT-4o, Claude, Gemini, Mistral, and more), making it easy to get started with no setup required in many cases.
Token Tracking
Arize AX tracks token usage via standard OpenInference attributes on your LLM spans:
| Attribute | Description |
|---|
llm.token_count.prompt | Number of tokens in the prompt |
llm.token_count.completion | Number of tokens in the completion |
llm.token_count.total | Total number of tokens (prompt + completion) |
Cost is calculated based on these token counts and the cost configuration for the model. The system supports multiple token types for detailed cost breakdowns:
| Token Type | Category | Description |
|---|
input | Prompt | Regular input tokens |
cache | Prompt | Cached prompt tokens |
cache_read | Prompt | Cache read tokens |
cache_write | Prompt | Cache write tokens |
cache_input | Prompt | Cached input tokens |
output | Completion | Regular output tokens |
reasoning | Completion | Reasoning tokens (e.g., o1/o3 models) |
audio | Both | Audio tokens |
Cost configs also support tiered pricing — volume-based pricing where cost per token changes based on total token count thresholds.
These token counts are how Arize calculates cost:
How Cost Tracking Works
When a span is received, Arize AX determines cost as follows:
- If the span already includes cost attributes (set by the client), those values are used as-is.
- Otherwise, the system looks up a cost configuration by matching
llm.model_name and llm.provider.
- The matching config’s per-token rates are applied to the span’s token counts.
- Cost configs are cached with a 10-minute TTL for performance.
Cost attributes on spans:
| Attribute | Description |
|---|
llm.cost.prompt | Total prompt cost |
llm.cost.completion | Total completion cost |
llm.cost.total | Total cost |
llm.cost.prompt_details.* | Cost breakdown by prompt token type |
llm.cost.completion_details.* | Cost breakdown by completion token type |
Set Up Cost Tracking
1. Use a Default (Zero Setup)
If your model and provider match a default, Arize automatically applies the correct pricing — no action needed.
2. Customize a Default
To tweak an existing config (e.g., apply discounts):
- Go to Settings > Cost Tracking > Configuration
- Click Options > Clone on a default config
- Edit fields like token type cost or provider name
3. Create from Scratch
To define your own model config:
- Click Add New
- Enter the model name (required)
- Optionally enter the provider
- Specify cost per 1 million tokens for each token type
- Assign each token type to Prompt or Completion
Cost configs are saved at the organization level.
Using Cost Data
Once configured, cost data is available across the platform.
Filtering and Monitoring
All cost attributes are available throughout the platform and can be used to:
- Filter traces or spans where cost exceeds a defined threshold
- Create monitors for high-cost traces or model behavior anomalies
- Build dashboards based on specific token types or cost groupings
Trace-Level Visualization
At the trace level, Arize aggregates cost across all LLM spans in the trace. This provides a complete view of how much it cost to serve a given request end-to-end.
Span-Level Visualization
You can also inspect cost at the individual span level, including a breakdown by token type. This allows you to:
- Pinpoint expensive steps in the LLM pipeline
- Analyze the relative contribution of different token categories (e.g., reasoning, cache, image)
Lookup Logic
To determine cost:
- We extract the model name from your trace using the following fallback order:
llm.model_name (Primary)
llm.invocation_parameters.model (Fallback 1)
metadata.model (Fallback 2)
- Optionally, if you provide a
provider, we’ll match that as well (e.g., differentiating OpenAI vs Azure OpenAI for gpt-4).
- Each token type (e.g., prompt, completion, audio) is matched against the configuration, and the cost is calculated per million tokens (1M token unit basis).
Important: Cost is not retroactive. To track costs, you must configure pricing before ingesting traces.
Supported Token Types and Semantic Conventions
You can send any token types using OpenInference semantic conventions. Below are the supported fields:
Prompt Tokens
| Token Type | Field Name |
|---|
| Prompt (Includes all input subtypes to LLM) | llm.token_count.prompt |
| Prompt Details | llm.token_count.prompt_details |
| Audio | llm.token_count.prompt_details.audio |
| Image | llm.token_count.prompt_details.image |
| Cache Input | llm.token_count.prompt_details.cache_input |
| Cache Read | llm.token_count.prompt_details.cache_read |
| Cache Write | llm.token_count.prompt_details.cache_write |
Completion Tokens
| Token Type | Field Name |
|---|
| Completion (Includes all output subtypes from LLM) | llm.token_count.completion |
| Audio | llm.token_count.completion_details.audio |
| Reasoning | llm.token_count.completion_details.reasoning |
| Image | llm.token_count.completion_details.image |
Total Tokens (Optional)
llm.token_count.total
Custom Token Types
You can also define custom token types under either prompt_details or completion_details. Just make sure to:
- Use semantic naming
- Include a matching token type and cost in your configuration
Each token sent will have a cost calculated provided a matching token type is defined in your configuration.
Next step
Group multi-turn conversations together with sessions: