Alyx Improvements
March 20–April 3, 2026- Trace Aggregations: Alyx can now aggregate numeric values from child spans in traces, grouped by attributes in the root span. Ask questions like “compute average token usage by router type” or “calculate total cost by user email” and get instant results.
- Auto-Fix Column Mappings: Alyx automatically detects and fixes broken evaluator column mappings using actual column data. It discovers eval-task pairs needing fixes, verifies data coverage, previews sample values for semantic fit, and applies full mapping updates while preserving correct mappings.
- Auto-Select Preview Spans: After fixing evaluator variable mappings, the preview panel now automatically selects the relevant span with data for each mapped column—no manual selection needed.
- Playground Onboarding: New “Start with Alyx” action on the Playgrounds list opens a new playground and guides you through setting up a customer support bot with datasets, prompts, evaluations, and experiments.
- Dataset Listings: Alyx is now available directly on the dataset listings page, making it easier to get AI-powered assistance when working with datasets.
- Custom Metrics Creation: Ask Alyx to create custom metrics using natural language. Alyx generates the query, shows a confirmation drawer for you to review and approve the name and query, and applies it automatically.
- Annotation Config Creation: Create and configure annotation configs through Alyx using natural language commands.
- Trace and Session Eval Entry Points: New entry points for creating evaluations directly from traces and sessions, streamlining evaluation workflows.
- Improved Error Messages: Alyx now shows clear, user-facing error messages when internal model provider errors occur, instead of generic failure states.
Session Replay
March 30–April 1, 2026 Session replay is now available to all users with privacy-first design for enterprise accounts. All text and input values are masked by default on enterprise production accounts, while filter inputs and Alyx chat remain visible for debugging. No masking is applied on dev/local environments or free/pro accounts.Evaluator Improvements
March 23–April 2, 2026- Code Evals in Eval Hub: Code evaluators are now first-class citizens in Eval Hub—create code evaluators (template or custom), version them, and reuse across tasks and experiments. Update an evaluator to save a new version, just like template evaluators.
- Custom Code Evals, Revamped: Live validation surfaces issues in your code block before you submit. Custom evals now support parameters (
self.param_name) and evaluate params, bringing them to parity with template evaluators. - Column Mapping Preview: The same preview panel from template evaluators is now available for code evals, with clear warnings for missing mappings, unresolved columns, and valid state.
- Code Evals in Playground: Select code evaluators from Eval Hub directly in Prompt Playground to score experiment runs.
- Optimization Direction: Evaluators now support optimization direction configuration, letting you specify whether higher or lower scores are better for your evaluation criteria.
- Manual Mode Access: Evaluators are now accessible from the home page in both onboarding and normal views in manual mode.
- Playground Eval Config: Configure evaluators directly in the playground with classification choices and custom scores, explanation toggles, and save configurations to the Eval Hub with version history.
- Task from Span: Create evaluator tasks directly from trace spans in the slideover, keeping the UI in sync with in-flight and newly created evaluators.
- Task and Eval Trace Links: Evaluation feedback tokens now include “Task Logs” and “Eval Trace” buttons for direct navigation to active online tasks or completed evaluation traces.
- Template Evaluations: Experiment runner now supports template evaluations with classification output as structured JSON including label, explanation, and score.
- Hide Null Outputs: New toggle in experiments to hide rows where all experiment outputs are null, defaulting to on. State persists via URL.
Monitor Improvements
March 20–25, 2026
- Historical Runs Visualization: Monitor runs now display metric values and thresholds as line charts overlaid on traffic volume bars, with toggle between live and alert data, granularity selectors, and audit log markers on the timeline.
- Webhook Notifications: Monitors now support webhook notifications with structured JSON payloads on status transitions. Payloads include monitor details, status, threshold, metric values, and links. Delivery is tracked with events and attempts tables for observability, and auth tokens are encrypted at rest.
SDKs & REST APIs
March 20–April 2, 2026 New SDK clients and REST API endpoints for managing platform resources programmatically:- Annotation Queues (Python & JavaScript SDKs): Complete annotation queue management in both SDKs with queue CRUD operations, record management, annotation submission, and record assignment. Supports both ALL and RANDOM assignment methods.
- Tasks API (Python & JavaScript SDKs): Comprehensive evaluation tasks support in both SDKs with task CRUD operations, task run management including trigger, list, get, and cancel. Python SDK includes a wait helper with configurable polling and timeout.
- Name-or-ID Resolution (Python & JavaScript SDKs): Reference resources by human-readable name instead of opaque base64 IDs across both SDKs. All resource parameters accept either name or ID with automatic resolution via API lookup, providing a consistent experience across both SDKs.
- Spans API with Annotations: The Spans API now returns annotations and evaluations in structured form, including user email lookup for user annotations.
- List Spans (JavaScript SDK): The JavaScript SDK now supports listing spans with filtering and pagination capabilities.
- Custom Max Past Years: SDK timestamp validation now supports custom
max_past_yearsoverride via parameter or environment variable, enabling long-term historical data ingestion for on-prem deployments.
CLI Commands
March 20–April 2, 2026 New command groups and capabilities for theax CLI:
- New Command Groups: Six new command groups added—
ax evaluatorsfor evaluator and version management,ax tasksfor evaluation task operations including wait-for-run,ax api-keysfor API key lifecycle management,ax ai-integrationsfor managing OpenAI, Azure, Bedrock, Vertex AI, Anthropic, and custom providers,ax promptsfor full prompt lifecycle with versions and labels, andax rolesfor role management. - Name Filters: Added
--name/-noption to all list commands that support it, including ai-integrations, annotation-configs, datasets, evaluators, projects, prompts, and tasks. Filter by case-insensitive substring. - Stdin Pipe Support: Read data from stdin for dataset and experiment creation with format auto-detection supporting JSON array, JSONL, and CSV.
- Classification Config: Configure classification evaluators from the terminal with
--classification-choicesfor label-to-score mappings,--directionfor optimization direction, and--data-granularityfor evaluation scope. - Agent Skills Install: Interactively install agent skills through the CLI with both interactive and non-interactive options.
Tracing Improvements
March 26–April 2, 2026- Trace Export via Alyx: Export traces directly through Alyx with natural language commands. Ask Alyx to export specific trace data for analysis, reporting, or integration with other tools.
- Sessions Metrics Bar: New metrics bar for sessions provides at-a-glance visibility into key session statistics and performance indicators.
- Linkable Trace Views: Share specific trace views with colleagues using direct links. Each trace view now has a unique URL for easy collaboration and reference.
- Saved Views Enhancements: Copy existing saved views to create new variations quickly, and start time is now included as a default column in saved trace views.
- Disable External Tracing: Administrators can now disable external tracing for spaces that should only accept internally generated traces.
Dashboard & Visualization
March 26–April 2, 2026- Cost Formatting: Dashboard charts and pivot tables automatically detect LLM cost dimensions and apply currency formatting with
$-prefixed values on axes, full precision in tooltips, and auto-defaulting Y-axis labels to “Cost ($)”. - Pivot Table Cardinality: Non-numeric dimensions in pivot tables now show cardinality and count metrics, making it easier to analyze categorical data distribution.
- Global Custom Metrics: Custom metrics are now accessible from the global navigation under the “Observe” section for improved discoverability.
- Preview Variables: Navigate to latest data and select columns directly in the preview variables panel for faster workflow.
Model & Integration Updates
March 26–27, 2026- Gemini 3.1 Models: Added Gemini 3.1 Pro Preview and Flash Lite Preview to the Vertex AI provider.
- OTLP JSON Support: The OTLP HTTP endpoint now accepts
application/jsoncontent type in addition to protobuf, making it easier to test with curl and integrate with languages that lack strong protobuf support.
Annotation Improvements
March 25–31, 2026- Queue Record Deletion: Delete annotation queue records individually or in bulk with new management capabilities.
- Accessibility Enhancements: Visual accessibility improvements for annotation queues including zebra striping with alternating background colors, bold titles, and increased spacing between configs.
Space Management
March 27–April 2, 2026- Space Creation Limits: Space creation now respects account tier limits with proper gating and user feedback when limits are reached.
- Spaces API Migration: The Spaces REST API has been migrated to a faster backend with cursor-based pagination and improved org and name filtering.