Our Vision for Alyx

The Vision: Claude Code and Cursor for AI Engineering

When Claude Code and Cursor revolutionized software development, they didn’t just add AI to an existing workflow—they fundamentally reimagined how developers interact with code. Instead of manually searching through files, writing queries, or configuring tools, developers could simply describe what they needed in natural language and have an intelligent agent understand context, navigate codebases, and take action.

We’re building Alyx with the same vision for AI engineering.

Just as Claude Code and Cursor became essential tools for code development, we envision Alyx as the indispensable agent for building, debugging, and optimizing AI systems. The core insight is the same: the future of complex technical work isn’t about building more dashboards or exposing more configuration—it’s about building intelligent agents that understand your applications and can execute complex workflows through natural conversation on your behalf.

Our Approach: Agent-First Architecture

Intelligent Orchestration Over Static Queries

Alyx is built around a sophisticated orchestration system that manages multi-step workflows, coordinates tool calls, and maintains context across complex analyses. Rather than requiring users to know which tool to use or what information to provide in their queries, Alyx dynamically routes requests, selects appropriate tools, and chains operations together to answer user questions. The orchestrator architecture enables:

Automatic task decomposition - Complex questions are broken down into multi-step plans
Tool call coordination - The agent selects and sequences tools based on context
Conversation continuity - Context is maintained across iterations and tool calls
Error handling and recovery - The agent can adapt when operations fail or need retries

Native Understanding of AI System Data

Traditional analysis tools treat AI system data as generic observability traces. Alyx is purpose-built to understand the unique structure and semantics of LLM workflows and AI agents. The system has deep knowledge of:

Traces and spans with inputs, outputs, tool calls, prompts, latency, and error tracebacks
Evaluation frameworks and how to interpret evaluation results
Prompt structure and optimization strategies
Experiment workflows from dataset creation through evaluation
Annotation schemas for categorizing and labeling issues

This domain expertise means Alyx can answer questions with precision without requiring you to manually configure what your system does, which columns are important, or how your data is structured. Alyx automatically understands that a “latency bottleneck” in an LLM system requires different analysis than a latency issue in a traditional service, and it knows which traces, spans, and metrics matter most for your specific question.

Multi-Step Planning with Visibility

One of our core design principles is transparency into the agent’s reasoning and progress. Alyx uses an explicit todo management system that:

Plans complex tasks before execution
Tracks progress through multi-step analyses
Provides visibility into what the agent is doing and why
Enables iteration as users refine their questions

When you ask Alyx to “find what’s wrong with this model and suggest improvements,” it doesn’t just start executing—it first creates a plan:

This planning approach, inspired by how experienced engineers approach complex problems, ensures thorough analysis and gives users confidence in the agent’s methodology.

Analysis, Not Just Categorization

Counting and categorizing alone aren’t enough. Without context, labels and aggregate metrics don’t explain what patterns mean, why they matter, or what to do next. Alyx takes a fundamentally different approach: instead of just categorizing and aggregating, it analyzes your data to provide actual insights. When Alyx identifies patterns in your traces, it explains:

What the pattern means in the context of your system
Why it’s happening based on the specific data it’s analyzing
What actions to take — including actions Alyx can execute on your behalf
What to investigate next as you iterate on improvements

This insight-driven approach means you don’t just get statistics—you get understanding.

See it in practice

Here are three ways Alyx turns intent into outcomes across the AI engineering lifecycle.

Error analysis without the grind

Most error analysis workflows are manual: sift through traces, annotate issues, collapse them into labels, guess what matters, then wire up evaluations after the fact. Alyx collapses that into a single question. If you have traces and annotations in Arize AX, you can ask: “Review my reasoning annotations, identify the most critical issue and turn it into an eval.” Alyx synthesizes annotations into discrete labels, determines what’s actually critical (not just most frequent), generates evaluation templates, and can spin up a live evaluation task. No manual label taxonomy debates, no wiring things together by hand—just the answer and the evals to back it up.

Prompt engineering without staring at a blank page

Prompt experimentation often starts with a blank playground, no dataset, and no baseline. With Alyx you can delegate. Ask Alyx to generate a dataset for your use case; it creates it, populates it with realistic examples, and loads it into the playground so you can iterate immediately. For something more complex: “Create two prompt variants, attach an evaluation, and run an experiment.” Alyx plans the work, interacts with the Playground UI, requests approval when needed, runs the experiment end-to-end, analyzes the results, and can recommend concrete improvements. You direct the outcome; Alyx handles execution.

Trace debugging that actually debugs

Most debugging tools just show you more data. Alyx explains why things broke. You can ask: “Find spans where we returned a final eval template without reasoning.” Alyx identifies the spans. Then, in a trace, you ask: “Why didn’t this return reasoning?” Alyx traces the failure to a specific guideline, points to the exact decision path, and surfaces the root cause. No guesswork—you know exactly what to fix.

Why This Approach Matters

Complexity Without Friction

Modern AI systems are incredibly complex—multi-agent workflows, complex prompt chains, evaluation results, and operational concerns all interact in ways that are difficult to understand. Traditional tools expose this complexity directly, requiring users to understand query languages, tool configurations, and data structures. Alyx absorbs this complexity into an intelligent agent that understands your intent and handles the technical details. You can ask “what’s causing the latency spikes?” and Alyx knows to:

Query trace data
Calculate latency contributions
Identify bottleneck spans
Correlate with error patterns
Provide specific recommendations

Democratizing AI Engineering Expertise

Building and operating AI systems requires specialized knowledge: understanding evaluation metrics, prompt engineering techniques, trace analysis, and system optimization. Alyx makes this expertise accessible through natural language, allowing more team members to contribute to AI system improvement without becoming experts in every aspect of the stack.

Iterative Improvement Through Conversation

The conversational interface enables an iterative workflow that’s impossible with static dashboards or one-shot queries. Users can:

Ask follow-up questions based on initial results
Refine analyses as they learn more
Explore different angles of investigation
Get explanations for technical concepts

This conversational loop makes complex system analysis feel collaborative rather than investigative.

Building for the Future of AI Engineering

We believe the future of AI engineering tools looks like Alyx: intelligent agents that understand your domain, can execute complex workflows, and enable natural language interaction with sophisticated systems. This isn’t about adding AI features to existing tools—it’s about reimagining how humans and AI collaborate on technical work. The same revolution that Claude Code and Cursor brought to software development—moving from manual navigation and configuration to intelligent, conversational assistance—is coming to AI engineering. Alyx is our vision for what that looks like.

Alyx represents a new paradigm in AI engineering tools: not more dashboards or more metrics, but an intelligent partner that understands your AI systems and helps you make them better.

Alyx

Observe

Evaluate

Develop

Prompts

Machine Learning

Security & Settings

The Vision: Claude Code and Cursor for AI Engineering

We’re building Alyx with the same vision for AI engineering.

Our Approach: Agent-First Architecture

Intelligent Orchestration Over Static Queries

Native Understanding of AI System Data

Multi-Step Planning with Visibility

Analysis, Not Just Categorization

See it in practice

Error analysis without the grind

Prompt engineering without staring at a blank page

Trace debugging that actually debugs

Why This Approach Matters

Complexity Without Friction

Democratizing AI Engineering Expertise

Iterative Improvement Through Conversation

Building for the Future of AI Engineering

Alyx

Observe

Evaluate

Develop

Prompts

Machine Learning

Security & Settings

​The Vision: Claude Code and Cursor for AI Engineering

​We’re building Alyx with the same vision for AI engineering.

​Our Approach: Agent-First Architecture

​Intelligent Orchestration Over Static Queries

​Native Understanding of AI System Data

​Multi-Step Planning with Visibility

​Analysis, Not Just Categorization

​See it in practice

​Error analysis without the grind

​Prompt engineering without staring at a blank page

​Trace debugging that actually debugs

​Why This Approach Matters

​Complexity Without Friction

​Democratizing AI Engineering Expertise

​Iterative Improvement Through Conversation

​Building for the Future of AI Engineering

The Vision: Claude Code and Cursor for AI Engineering

We’re building Alyx with the same vision for AI engineering.

Our Approach: Agent-First Architecture

Intelligent Orchestration Over Static Queries

Native Understanding of AI System Data

Multi-Step Planning with Visibility

Analysis, Not Just Categorization

See it in practice

Error analysis without the grind

Prompt engineering without staring at a blank page

Trace debugging that actually debugs

Why This Approach Matters

Complexity Without Friction

Democratizing AI Engineering Expertise

Iterative Improvement Through Conversation

Building for the Future of AI Engineering