All pages
Powered by GitBook
1 of 3

Loading...

Loading...

Loading...

LiteLLM Tracing

LiteLLM allows developers to call all LLM APIs using the openAI format. LiteLLM Proxy is a proxy server to call 100+ LLMs in OpenAI format. Both are supported by this auto-instrumentation.

Any calls made to the following functions will be automatically captured by this integration:

  • completion()

  • acompletion()

  • completion_with_retries()

  • embedding()

  • aembedding()

  • image_generation()

  • aimage_generation()

Launch Phoenix

Install

pip install openinference-instrumentation-litellm litellm

Setup

Use the register function to connect your application to Phoenix:

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Add any API keys needed by the models you are using with LiteLLM.

import os
os.environ["OPENAI_API_KEY"] = "PASTE_YOUR_API_KEY_HERE"

Run LiteLLM

You can now use LiteLLM as normal and calls will be traces in Phoenix.

import litellm
completion_response = litellm.completion(model="gpt-3.5-turbo",
                   messages=[{"content": "What's the capital of China?", "role": "user"}])
print(completion_response)

Observe

Traces should now be visible in Phoenix!

Resources

  • OpenInference Instrumentation

Sign up for Phoenix:

  1. Sign up for an Arize Phoenix account at https://app.phoenix.arize.com/login

  2. Click Create Space, then follow the prompts to create and launch your space.

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint and API Key:

From your new Phoenix Space

  1. Create your API key from the Settings page

  2. Copy your Hostname from the Settings page

  3. In your code, set your endpoint and API key:

import os

os.environ["PHOENIX_API_KEY"] = "ADD YOUR PHOENIX API KEY"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "ADD YOUR PHOENIX HOSTNAME"

# If you created your Phoenix Cloud instance before June 24th, 2025,
# you also need to set the API key as a header:
# os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}"

Having trouble finding your endpoint? Check out Finding your Phoenix Endpoint

Launch your local Phoenix instance:

pip install arize-phoenix
phoenix serve

For details on customizing a local terminal deployment, see Terminal Setup.

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

See Terminal for more details.

Pull latest Phoenix image from Docker Hub:

docker pull arizephoenix/phoenix:latest

Run your containerized instance:

docker run -p 6006:6006 arizephoenix/phoenix:latest

This will expose the Phoenix on localhost:6006

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

For more info on using Phoenix with Docker, see Docker.

Install packages:

pip install arize-phoenix

Launch Phoenix:

import phoenix as px
px.launch_app()

By default, notebook instances do not have persistent storage, so your traces will disappear after the notebook is closed. See self-hosting or use one of the other deployment options to retain traces.

LiteLLM

LiteLLM is an open-source platform that provides a unified interface to manage and access over 100 LLMs from various providers.

Website: https://www.litellm.ai/

LiteLLM Evals

Configure and run LiteLLM for evals

Need to install the extra dependency litellm>=1.0.3

You can choose among supported by LiteLLM. Make sure you have set the right environment variables set prior to initializing the model. For additional information about the environment variables for specific model providers visit:

Here is an example of how to initialize LiteLLMModel for llama3 using ollama.

LiteLLM Tracing

LiteLLM Evals

Cover
Cover
class LiteLLMModel(BaseEvalModel):
    model: str = "gpt-3.5-turbo"
    """The model name to use."""
    temperature: float = 0.0
    """What sampling temperature to use."""
    max_tokens: int = 256
    """The maximum number of tokens to generate in the completion."""
    top_p: float = 1
    """Total probability mass of tokens to consider at each step."""
    num_retries: int = 6
    """Maximum number to retry a model if an RateLimitError, OpenAIError, or
    ServiceUnavailableError occurs."""
    request_timeout: int = 60
    """Maximum number of seconds to wait when retrying."""
    model_kwargs: Dict[str, Any] = field(default_factory=dict)
    """Model specific params"""
import os

from phoenix.evals import LiteLLMModel

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"

model = LiteLLMModel(model="ollama/llama3")
multiple models
LiteLLM provider specific params
Google Colab
How to use Ollama with LiteLLMModel
Logo