OTel Collector Deployment Patterns
Telemetry Routing via OpenTelemetry Collector

Overview and Motivation
This document outlines the proposed architecture for leveraging the open-source OpenTelemetry Collector as an intermediate telemetry processing and routing layer. This pattern facilitates the collection, processing, and distribution of telemetry data (logs, metrics, traces) from instrumented applications to various observability backends such as Arize.
In modern distributed systems, observability data must be efficiently collected and routed to different backend systems for analysis, alerting, and visualization. Direct instrumentation to each backend can lead to tightly coupled systems, increased maintenance complexity, and limited flexibility. Using an OpenTelemetry Collector provides a decoupled, scalable, and extensible approach to telemetry routing.
Architecture Components
1. Instrumented LLM Applications
LLM applications are instrumented manually with the OpenTelemetry SDK, or with an auto-instrumentor which uses OTel under the hood.
Applications export telemetry to a locally deployed or central OpenTelemetry Collector.
2. OpenTelemetry Collector
Acts as a gateway or agent.
Collects telemetry data from applications.
Applies optional processing and transformation.
Routes data to one or more configured backends.
3. Backends
Primary: Arize
Secondary (optional): Kafka, Prometheus, etc.
Deployment Models
Agent Mode (not to be confused with LLM agents)
Deployment: Collector instance is running with the application or on the same host as the application (e.g., as a sidecar or daemonset)
Advantages (from OTel docs):
Simple to get started
Clear 1:1 mapping between application and collector
Considerations (from OTel docs):
Scalability (human and load-wise)
Not as flexible as other approaches
Gateway Mode
Deployment: A centralized or regionalized collector service receives telemetry from multiple agents or directly from applications.
Advantages (from OTel docs):
Separation of concerns such as centrally managed credentials
Centralized policy management (e.g., filtering or sampling spans)
Considerations (from OTel docs):
Increased complexity - additional service to maintain and that can fail
Added latency in case of cascaded collectors
Higher overall resource usage (costs)
Hybrid Model
Deployment: Combines Agent and Gateway modes.
Advantages:
Distributed data collection and centralized processing.
Scales well in large environments.
Data Flow
The application emits telemetry data using one of the Arize auto-instrumentors or the OTel SDK.
Telemetry is sent to an OpenTelemetry Collector.
The Collector applies processors (e.g., filtering, batching, sampling).
The Collector exports the telemetry to configured backends like Arize.
Example Configuration File
Example Architecture

Benefits of This Design
✅ Decoupled Instrumentation
Applications do not need to manage backend-specific exporters or credentials, reducing code complexity.
✅ Multi-Destination Exporting
Collectors can fan out telemetry data to multiple destinations (e.g., Arize and Kafka)
✅ Consistent Observability Pipeline
Centralizes telemetry processing (e.g., sampling, filtering), ensuring consistent policies across all services.
✅ Scalability and Reliability
Collectors support load balancing and can be scaled horizontally. They also offer buffering and retries, increasing telemetry delivery reliability.
✅ Security and Compliance
Sensitive credentials (e.g., Arize API keys) are stored and managed centrally, not within application code.
✅ Extensibility
New exporters and processors can be added to the Collector without changing the application instrumentation.
Last updated
Was this helpful?