Arize and Vercel logod

Instrumenting Your LLM Application: Arize Phoenix and Vercel AI SDK

Published Nov 19, 2024

Evan Jolley

Evan Jolley

Contributor

Instrumentation is an important tool for developers building with LLMs. It provides insight into application performance, behavior, and impact.

This blog will cover:

Why instrumentation matters for LLM applications
Benefits of implementing instrumentation
A guide on integrating Arize Phoenix with Vercel AI SDK for observability in Next.js applications

Why Instrument Your LLM Application?

1. Performance Monitoring

LLM performance can vary based on numerous factors such as input complexity, model size, and runtime conditions. By implementing instrumentation, you can monitor key metrics like response times, latency, and token usage. This data helps identify bottlenecks and performance issues in your application. With these insights, you can optimize your application for better efficiency and user experience, potentially adjusting model parameters, refining prompts, or restructuring your application architecture.

2. Quality Assurance

LLMs are not infallible and can produce inconsistent or inappropriate outputs. Instrumentation helps detect anomalies in model responses, track the quality and relevance of generated content, and identify potential biases or errors in the model’s output. This level of oversight is important for maintaining the integrity and reliability of your LLM apps.

3. Resource Management

LLMs can be resource-intensive, often requiring significant computational power and memory. Instrumentation allows you to track resource usage, including CPU and GPU utilization, memory consumption, and associated costs. This data helps you manage expenses, allocate resources efficiently, and make informed decisions about scaling your infrastructure.

4. User Behavior Analysis

Understanding how users interact with your LLM application is key to its improvement. Instrumentation provides insights into user queries and preferences, data on which features are most utilized, and information on user satisfaction and engagement. These insights can guide feature development and user experience enhancements.

5. Compliance and Auditing

In many industries, the use of AI is subject to regulatory scrutiny. Instrumentation supports logging of all AI interactions for audit trails, tracking of data usage and privacy compliance, and generation of reports for regulatory bodies.

6. Continuous Improvement

This field is dynamic, with models and best practices evolving rapidly. Instrumentation facilitates A/B testing of different model versions or prompts, tracking of model performance over time, and data collection for fine-tuning and improving your models. This data-driven approach allows your application to continue to meet user needs.

While the benefits of instrumentation are clear, implementing it effectively can be challenging. This is where observability tools like Arize Phoenix come into play. These tools provide easy integration with existing frameworks and SDKs, pre-built dashboards and visualizations, alerting systems for anomalies or issues, and advanced analytics capabilities. Phoenix allows developers to focus on building great AI applications while having confidence in their ability to monitor and improve them over time.

The Arize Vercel Integration

Now that we understand the importance of instrumentation in LLM applications, let’s explore a practical implementation using Arize Phoenix and Vercel AI SDK. This integration allows developers to easily add observability to their AI-powered applications built with Next.js.

Sample code of a complete Next.js application with Arize Phoenix implemented can be found at this repository.

1. Install Dependencies

To get started, you’ll need:

Vercel AI SDK version 3.3 or higher
Arize Phoenix observability packages
Vercel OpenTelemetry package
General OpenTelemetry packages

Here is the full list of dependencies present in the example:


 "dependencies": {
    "@ai-sdk/openai": "latest",
    "@ai-sdk/react": "latest",
    "@arizeai/openinference-semantic-conventions": "^0.10.0",
    "@arizeai/openinference-vercel": "^1.0.0",
    "@opentelemetry/api": "^1.9.0",
    "@opentelemetry/api-logs": "0.52.1",
    "@opentelemetry/exporter-trace-otlp-proto": "^0.52.1",
    "@opentelemetry/instrumentation": "0.52.1",
    "@opentelemetry/sdk-logs": "0.52.1",
    "@opentelemetry/sdk-trace-base": "^1.25.1",
    "@vercel/otel": "1.9.1",
    "ai": "latest",
    "next": "latest",
    "openai": "4.52.6",
    "react": "^18",
    "react-dom": "^18"
  },

2. Enable Instrumentation in Next.js

Instrumentation and telemetry is currently “experimental” within Next.js.

To use these tools, you must enable the instrumentation hook within your next.config.js file:


/** @type {import('next').NextConfig} */
const nextConfig = {
  output: "standalone",
};

nextConfig.experimental = {
  instrumentationHook: true,
};

module.exports = nextConfig;

3. Create an Instrumentation File

Set up an instrumentation.ts file that will be automatically picked up by Next.js:


import { registerOTel } from "@vercel/otel";
import { diag, DiagConsoleLogger, DiagLogLevel } from "@opentelemetry/api";
import {
  isOpenInferenceSpan,
  OpenInferenceSimpleSpanProcessor,
} from "@arizeai/openinference-vercel";
import { OTLPTraceExporter } from 
"@opentelemetry/exporter-trace-otlp-proto";
import { SEMRESATTRS_PROJECT_NAME } from "@arizeai/openinference-semantic-conventions";

// For troubleshooting, set the log level to DiagLogLevel.DEBUG
// This is not required and should not be added in a production setting
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);

export function register() {
  registerOTel({
    serviceName: "phoenix-next-app",
    attributes: {
      // This is not required but it will allow you to send traces to a specific project in phoenix
      [SEMRESATTRS_PROJECT_NAME]: "phoenix-next-app",
    },
    spanProcessors: [
      new OpenInferenceSimpleSpanProcessor({
        exporter: new OTLPTraceExporter({
          headers: {
            api_key: process.env["PHOENIX_API_KEY"],
          },
          url: "http://localhost:6006/v1/traces",
        }),
        spanFilter: (span) => {
          // Only export spans that are OpenInference to remove non-generative spans
          // This should be removed if you want to export all spans
          return isOpenInferenceSpan(span);
        },
      }),
    ],
 });
}

Key points in this configuration:

Add the OpenInference SimpleSpanProcessor to convert Vercel AI SDK spans into OpenInference-compliant spans.
Use a span filter to export only generative AI-related spans.
Configure the endpoint for exporting traces to Phoenix.

4. Enable Telemetry for AI SDK Calls

In your AI chat route file, enable telemetry for each AI SDK call like this:


import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";

export async function POST(req: Request) {
  const { messages } = await req.json();
  const textStream = await streamText({
    model: openai("gpt-3.5-turbo"),
    maxTokens: 100,
    messages: messages,
    experimental_telemetry: {
      isEnabled: true,
      metadata: { route: "api/chat" },
    },
  });

  return textStream.toDataStreamResponse();
}

As you can see, telemetry is considered “experimental” here as well.

You can also add custom metadata to each call for better filtering and analysis in Phoenix.

Deployment and Monitoring

Once you’ve set up the instrumentation, you can deploy your application to Vercel. Make sure to set the necessary environment variables, including your OpenAI API key and Phoenix API key.

After deployment, you can monitor your application’s performance in the Phoenix UI. The traces will show details such as:

Invocation parameters (e.g., max tokens)
LLM output
Token count

You can use Phoenix’s features to filter traces, annotate information, and perform quality assurance checks.

A video of the step by step walkthrough of running this integration can be found here. For more information and access to the open-source projects, visit the Arize AI GitHub repositories for Phoenix and OpenInference.

Share

Suggested reading

Understanding LLM Benchmarks

40 Large Language Model Benchmarks and The Future of Model Evaluation

Prompt Optimization Techniques