Documentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
RBAC (GA)
April 13, 2026 Role-Based Access Control is now generally available on Arize AX. Define custom roles, scope permissions to spaces and resources, and manage who can create, read, update, and delete platform resources from a single, auditable surface.- Resource Restrictions & Role Bindings: Restrict access to specific resources and bind roles to users or service accounts as the building blocks of fine-grained access control.
- API Key Permissions: New
USER_KEY_CREATE,SERVICE_KEY_READ, andSERVICE_KEY_UPDATEpermissions, plus developer permission checks on API key mutations, give admins precise control over who can mint and rotate keys. - User Creation Hardening: GraphQL API key creation has been removed in favor of REST flows, and a permissions toggle on user creation makes default access explicit at onboarding time.
Alyx Improvements
April 4–30, 2026- Floating Alyx Button: A persistent Alyx launcher follows you across the platform, with a pulse indicator that highlights when context-relevant suggestions are available.
- Alyx History Menu: Pick up where you left off—recent Alyx conversations are now grouped in a dedicated history menu so you can resume threads without losing context.
- Alyx Writes Code Evaluators: Describe what you want to measure and Alyx drafts the code evaluator, wires up parameters, and previews results before saving.
- Auto-Fix Variable Mappings on Eval & Task Forms: A new Alyx button on evaluator and task creation pages auto-detects and fixes broken variable mappings using actual column data—Alyx can also be auto-triggered to repair mappings without leaving the form.
- Run Experiments from Datasets: Trigger experiment runs through Alyx directly from the dataset page—no need to navigate to the experiments surface.
- Unify Query Filters with Alyx: Translate natural language into structured query filters across tracing, sessions, and evaluations from one consistent Alyx flow.
- Eval Hub Alyx: Alyx is now embedded throughout Eval Hub, including a dedicated assistant on the evaluator detail page and a “Start with Alyx” prompt on empty states.
- Datasets Empty State: New Alyx entry point on the datasets page guides you through creating your first dataset and configuring evaluations.
- Materialized Column Awareness: Alyx is now aware of materialized input and output columns when filtering, returning more accurate results on projects that use derived attributes.
- Integrations Slideover: AI integrations now surface in a slideover within Alyx chat so you can view connection details without leaving the conversation.
- Claude Models in Canada: Anthropic Claude models are now enabled for Alyx in the Canada region for customers in that data residency.
- Trace Aggregations: Alyx can now aggregate numeric values from child spans in traces, grouped by attributes in the root span. Ask questions like “compute average token usage by router type” or “calculate total cost by user email” and get instant results.
- Auto-Fix Column Mappings: Alyx automatically detects and fixes broken evaluator column mappings using actual column data. It discovers eval-task pairs needing fixes, verifies data coverage, previews sample values for semantic fit, and applies full mapping updates while preserving correct mappings.
- Auto-Select Preview Spans: After fixing evaluator variable mappings, the preview panel now automatically selects the relevant span with data for each mapped column—no manual selection needed.
- Playground Onboarding: New “Start with Alyx” action on the Playgrounds list opens a new playground and guides you through setting up a customer support bot with datasets, prompts, evaluations, and experiments.
- Dataset Listings: Alyx is now available directly on the dataset listings page, making it easier to get AI-powered assistance when working with datasets.
- Custom Metrics Creation: Ask Alyx to create custom metrics using natural language. Alyx generates the query, shows a confirmation drawer for you to review and approve the name and query, and applies it automatically.
- Annotation Config Creation: Create and configure annotation configs through Alyx using natural language commands.
- Trace and Session Eval Entry Points: New entry points for creating evaluations directly from traces and sessions, streamlining evaluation workflows.
- Improved Error Messages: Alyx now shows clear, user-facing error messages when internal model provider errors occur, instead of generic failure states.
Evaluator Improvements
April 4–30, 2026- Combined Eval & Task Form: Creating an evaluator and its task is now a single, streamlined form—configure the evaluator, define the task, and save in one pass instead of two.
- Test Code Eval on Example: A new “Test on example” button runs your code evaluator against a single example so you can validate logic before scoring at scale.
- All Variables in Eval Template: Evaluator templates now accept all available variables, removing prior caps on the number of inputs you can wire into a prompt.
- Optimization Direction “None”: Add a
noneoption to evaluator optimization direction for evaluators where higher or lower scores aren’t inherently better. - Save Evaluator Version on Run: Running an experiment now records the exact evaluator version used, making historical comparisons reproducible.
- Eval Metadata in Tracing Details: Evaluator metadata—name, version, score, explanation—is now surfaced inline in trace span details for at-a-glance triage.
- Streamlined Eval Task Menu: The evaluator tasks menu has been reorganized for faster access to common operations.
- List Evals in Tracing Tasks Button: The tracing toolbar now shows the active evaluators on a project, so you can confirm what’s scoring before drilling in.
- Code Evals in Eval Hub: Code evaluators are now first-class citizens in Eval Hub—create code evaluators (template or custom), version them, and reuse across tasks and experiments. Update an evaluator to save a new version, just like template evaluators.
- Custom Code Evals, Revamped: Live validation surfaces issues in your code block before you submit. Custom evals now support parameters (
self.param_name) and evaluate params, bringing them to parity with template evaluators. - Column Mapping Preview: The same preview panel from template evaluators is now available for code evals, with clear warnings for missing mappings, unresolved columns, and valid state.
- Code Evals in Playground: Select code evaluators from Eval Hub directly in Prompt Playground to score experiment runs.
- Optimization Direction: Evaluators now support optimization direction configuration, letting you specify whether higher or lower scores are better for your evaluation criteria.
- Manual Mode Access: Evaluators are now accessible from the home page in both onboarding and normal views in manual mode.
- Playground Eval Config: Configure evaluators directly in the playground with classification choices and custom scores, explanation toggles, and save configurations to the Eval Hub with version history.
- Task from Span: Create evaluator tasks directly from trace spans in the slideover, keeping the UI in sync with in-flight and newly created evaluators.
- Task and Eval Trace Links: Evaluation feedback tokens now include “Task Logs” and “Eval Trace” buttons for direct navigation to active online tasks or completed evaluation traces.
- Template Evaluations: Experiment runner now supports template evaluations with classification output as structured JSON including label, explanation, and score.
- Hide Null Outputs: New toggle in experiments to hide rows where all experiment outputs are null, defaulting to on. State persists via URL.
Annotation Queues
April 4–30, 2026- Records-Per-Queue Caps: Set a cap on the number of records a single annotation queue can hold, preventing unbounded growth and runaway labeling costs.
- Duplicate & Capacity Surfacing: When adding records to a queue, the UI now surfaces the count of duplicates skipped and any capacity restrictions hit, so reviewers know exactly what landed.
- Manual Annotation Submission: Reviewers can now submit annotations manually from the queue without going through the full assignment flow.
- CSV / JSONL Download: Export annotation queue records to CSV or JSONL for offline review or downstream pipelines.
- User-ID-to-Email Rename: Annotation column names that previously used opaque user IDs are now rendered with email addresses for readability.
- Buffered Annotation Updates: User annotation updates are now buffered through the existing ingestion path for more reliable persistence under load.
- Preselected Configs from Annotation Columns: When configuring annotations on a dataset, the UI now preselects configs based on existing annotation columns to avoid redundant setup.
Datasets & Experiments
April 4–30, 2026- Sortable Example Columns: Sort by any column on the dataset examples table, including custom annotation columns and metrics.
- Span Dataset Metrics Bar: Span dataset versions now show a metrics stats bar at the top for instant visibility into volume, scores, and drift.
- Expand & Collapse Dataset Rows: Long dataset rows can now be expanded inline to inspect full input/output payloads without leaving the table.
- Image Hover Preview: Image cells in experiment slideovers now load a larger preview on hover, making it easier to spot visual regressions across runs.
- Nested JSON Preview: Nested JSON values in dataset cells render with collapsible structure instead of a flat string blob.
- CSV Integer Support: CSV uploads now correctly preserve integer types instead of coercing them to floats or strings.
- Avg. Latency on Experiments: Experiments now show average latency alongside score and cost columns, with consistent eval-experiment formatting across the table.
Tracing & Sessions
April 4–30, 2026- Multi-Span Admission Path: Span/trace/session evaluation tasks can now ingest and score multi-span queries through a unified admission path, with continuous grouping and sizing for higher throughput.
- Sessions Columns & Expansion: The sessions tab now supports custom columns and per-row expand/collapse for quick session-level triage.
- Allow Double Quotes in Auto-Add Filter: Auto-add filters now accept double-quoted values, fixing a class of “no results” bugs on string filters.
- Auto Granularity in Monitor Charts: Monitor metric charts default to a granularity that returns data, eliminating empty time-series on first load.
Webhooks & Events
April 4–30, 2026- Webhooks UI: Configure, test, and manage webhooks from a dedicated UI with payload previews and delivery history.
- Test Webhooks Button: Send a synthetic event to your webhook endpoint with a single click to verify auth and payload shape before going live.
- ES-Powered Events Page: The Webhooks panel has been replaced with an Events page backed by Elasticsearch, supporting full-text search, structured filters, and faster lookups across delivery history.
Playground
April 4–30, 2026- Prompt Switcher: Switch between prompts in the playground without losing your current context, making side-by-side comparison faster.
- PNG Image Display: PNG image inputs now render directly in the playground without requiring a data-URI prefix.
- Reset Last-Used LLM: The “last used LLM” is no longer pinned in local storage, so a fresh playground always defaults to your space’s preferred model.
- Default Integration per Space: Set a default AI integration per space from Evals or Playground via a new modal, so prompts run against the right provider out of the box.
Data Fabric
April 4–30, 2026- Delta Lake Format: Specify Delta Lake as both an input format and an output destination for Data Fabric jobs.
Models & Integrations
April 4–30, 2026- Anthropic Opus 4.7: Claude Opus 4.7 is now available across Alyx, Playground, and evaluators.
- OpenAI o3 & o4-mini: OpenAI’s o3 and o4-mini reasoning models are now selectable for prompts and evaluations.
- GPT-5.4 Family: Added
gpt-5.4-nano-2026-03-17,gpt-5.4-mini, andgpt-5.4-mini-2026-03-17to the OpenAI provider. - GPT-5.5 & GPT-5.5-Pro: Latest OpenAI flagship and pro variants are now supported.
CLI & Skills
April 4–30, 2026- Available skills — install and use Arize skills with Cursor, Claude Code, and other coding agents.
- AX CLI overview — install the CLI, manage profiles, and run commands from your terminal.
Session Replay
March 30–April 1, 2026 Session replay is now available to all users with privacy-first design for enterprise accounts. All text and input values are masked by default on enterprise production accounts, while filter inputs and Alyx chat remain visible for debugging. No masking is applied on dev/local environments or free/pro accounts.Monitor Improvements
March 20–25, 2026
- Historical Runs Visualization: Monitor runs now display metric values and thresholds as line charts overlaid on traffic volume bars, with toggle between live and alert data, granularity selectors, and audit log markers on the timeline.
- Webhook Notifications: Monitors now support webhook notifications with structured JSON payloads on status transitions. Payloads include monitor details, status, threshold, metric values, and links. Delivery is tracked with events and attempts tables for observability, and auth tokens are encrypted at rest.
SDKs & REST APIs
March 20–April 2, 2026 New SDK clients and REST API endpoints for managing platform resources programmatically:- Annotation Queues (Python & JavaScript SDKs): Complete annotation queue management in both SDKs with queue CRUD operations, record management, annotation submission, and record assignment. Supports both ALL and RANDOM assignment methods.
- Tasks API (Python & JavaScript SDKs): Comprehensive evaluation tasks support in both SDKs with task CRUD operations, task run management including trigger, list, get, and cancel. Python SDK includes a wait helper with configurable polling and timeout.
- Name-or-ID Resolution (Python & JavaScript SDKs): Reference resources by human-readable name instead of opaque base64 IDs across both SDKs. All resource parameters accept either name or ID with automatic resolution via API lookup, providing a consistent experience across both SDKs.
- Spans API with Annotations: The Spans API now returns annotations and evaluations in structured form, including user email lookup for user annotations.
- List Spans (JavaScript SDK): The JavaScript SDK now supports listing spans with filtering and pagination capabilities.
- Custom Max Past Years: SDK timestamp validation now supports custom
max_past_yearsoverride via parameter or environment variable, enabling long-term historical data ingestion for on-prem deployments.
CLI Commands
March 20–April 2, 2026 New command groups and capabilities for theax CLI:
- New Command Groups: Six new command groups added—
ax evaluatorsfor evaluator and version management,ax tasksfor evaluation task operations including wait-for-run,ax api-keysfor API key lifecycle management,ax ai-integrationsfor managing OpenAI, Azure, Bedrock, Vertex AI, Anthropic, and custom providers,ax promptsfor full prompt lifecycle with versions and labels, andax rolesfor role management. - Name Filters: Added
--name/-noption to all list commands that support it, including ai-integrations, annotation-configs, datasets, evaluators, projects, prompts, and tasks. Filter by case-insensitive substring. - Stdin Pipe Support: Read data from stdin for dataset and experiment creation with format auto-detection supporting JSON array, JSONL, and CSV.
- Classification Config: Configure classification evaluators from the terminal with
--classification-choicesfor label-to-score mappings,--directionfor optimization direction, and--data-granularityfor evaluation scope. - Agent Skills Install: Interactively install agent skills through the CLI with both interactive and non-interactive options.
Tracing Improvements
March 26–April 2, 2026- Trace Export via Alyx: Export traces directly through Alyx with natural language commands. Ask Alyx to export specific trace data for analysis, reporting, or integration with other tools.
- Sessions Metrics Bar: New metrics bar for sessions provides at-a-glance visibility into key session statistics and performance indicators.
- Linkable Trace Views: Share specific trace views with colleagues using direct links. Each trace view now has a unique URL for easy collaboration and reference.
- Saved Views Enhancements: Copy existing saved views to create new variations quickly, and start time is now included as a default column in saved trace views.
- Disable External Tracing: Administrators can now disable external tracing for spaces that should only accept internally generated traces.
Dashboard & Visualization
March 26–April 2, 2026- Cost Formatting: Dashboard charts and pivot tables automatically detect LLM cost dimensions and apply currency formatting with
$-prefixed values on axes, full precision in tooltips, and auto-defaulting Y-axis labels to “Cost ($)”. - Pivot Table Cardinality: Non-numeric dimensions in pivot tables now show cardinality and count metrics, making it easier to analyze categorical data distribution.
- Global Custom Metrics: Custom metrics are now accessible from the global navigation under the “Observe” section for improved discoverability.
- Preview Variables: Navigate to latest data and select columns directly in the preview variables panel for faster workflow.
Model & Integration Updates
March 26–27, 2026- Gemini 3.1 Models: Added Gemini 3.1 Pro Preview and Flash Lite Preview to the Vertex AI provider.
- OTLP JSON Support: The OTLP HTTP endpoint now accepts
application/jsoncontent type in addition to protobuf, making it easier to test with curl and integrate with languages that lack strong protobuf support.
Annotation Improvements
March 25–31, 2026- Queue Record Deletion: Delete annotation queue records individually or in bulk with new management capabilities.
- Accessibility Enhancements: Visual accessibility improvements for annotation queues including zebra striping with alternating background colors, bold titles, and increased spacing between configs.
Space Management
March 27–April 2, 2026- Space Creation Limits: Space creation now respects account tier limits with proper gating and user feedback when limits are reached.
- Spaces API Migration: The Spaces REST API has been migrated to a faster backend with cursor-based pagination and improved org and name filtering.