Documentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
The OpenTelemetry Collector is a service that receives telemetry, processes it, and exports it to one or more backends. It’s optional — most applications send spans directly from the SDK to Arize AX — but it unlocks centralized policy, multi-backend fan-out, dynamic routing, and tail sampling that the SDK can’t express.
For practical Collector configuration including authentication and dynamic project routing, see Advanced Patterns: OTEL Collector. This page covers the concept.
Anatomy of a Collector
Collectors are composed of pipelines. Each pipeline chains three component types:
| Component | Role |
|---|
| Receiver | Listens for incoming telemetry (OTLP over gRPC, OTLP over HTTP, Jaeger, Zipkin, …). |
| Processor | Transforms, filters, enriches, batches, or samples spans as they flow through. |
| Exporter | Sends the processed telemetry out — to Arize AX, to another backend, to multiple backends, to disk. |
All components and pipelines are defined in a single config.yaml. The Collector’s flexibility comes from how those components are composed.
For the canonical specification, see OpenTelemetry Collector documentation.
Deployment Models
Three common ways to deploy a Collector:
| Model | How it works | Best for |
|---|
| Agent | Collector runs alongside the application — as a sidecar container, a daemonset, or a local process. | Simple setups, clear 1:1 mapping between app and collector. |
| Gateway | A central collector receives from many applications. | Centralized policy, credential management, single point of egress. |
| Hybrid | Agent collectors forward to a centralized gateway. | Large environments — distributed collection plus centralized processing. |
Common Use Cases
Reasons to put a Collector between your applications and Arize AX:
| Use case | What the Collector does |
|---|
| Filtering | Drop spans by name, attribute, or pattern using a filter processor — useful for cutting noise from health checks or known-uninteresting paths. |
| PII redaction or attribute modification | Use a transform processor to scrub sensitive fields, hash user IDs, or replace values before export. |
| Fan-out to multiple backends | Send the same spans to Arize AX and a long-term storage system, or to Arize AX and a metrics backend, without duplicating instrumentation in the app. |
| Dynamic project routing | Route traces to different Arize AX projects or spaces based on span attributes — see the dynamic routing example. |
| Tail sampling | Buffer complete traces and decide which to keep based on outcome (error spans, slow requests, specific attributes). |
| Credential centralization | Application code stays free of Arize AX API keys; the Collector handles authentication at a single chokepoint. |
| Tail-end batching | Reduce the number of network round-trips from your fleet to Arize AX by batching across applications at the Collector. |
Common Pitfalls
A few Collector failure modes to know about:
- Forgetting Arize AX authentication — the Collector needs to add
arize-space-id and arize-api-key (or space_id/api_key) headers to outbound requests. Use the headers_setter extension or set them in the exporter configuration. Without them, the Arize AX collector rejects spans.
- Modifying shared span objects across pipelines — when one pipeline mutates a span, every other pipeline that processes the same span sees the modification. Use a
routing connector to duplicate spans cleanly before fan-out.
- No batch processor at the end of the pipeline — for production volumes, the last processor in a pipeline should be a
batch processor. Without it, the Collector exports one span at a time, which is inefficient.
- Wrong Collector endpoint — applications need to point at the Collector’s OTLP endpoint (gRPC:
:4317, HTTP: :4318), not at Arize AX directly. Mixing the two is a common source of “why are some spans missing?” debugging.
- Receiver missing
include_metadata: true — when a centralized gateway uses inbound request metadata for routing (e.g., reading the target Arize AX space from a header), the receiver has to be told to make that metadata available. Without it, the routing extension has nothing to read.
Where the Collector Fits
Both with and without a Collector, your application code looks the same — the Exporter points at an OTLP endpoint. The difference is just which OTLP endpoint:
Without Collector:
App → Exporter → otlp.arize.com
With Collector:
App → Exporter → Collector → (processors) → Arize AX (and/or other backends)
Start without a Collector. Add one when you need centralized policy, multi-backend fan-out, tail sampling, or routing logic the SDK can’t express.
You’ve completed the OpenTelemetry and OpenInference reference
Every concept in this section — signals, the four OTel components, the Arize AX helpers, OpenInference conventions, span kinds, instrumentation approaches, context, sampling, and the Collector — is now in your toolbox.
Time to instrument: