> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Continuously Monitor

> Set up monitors and alerts so you know the moment your app's quality, latency, or costs cross a threshold

Your [dashboards](/ax/observe/dashboards) show you the current state, but you can't watch them all day. Models change, user patterns shift, and what worked last week can silently degrade. You need to know the moment something goes wrong.

Monitors watch your metrics continuously and alert you when something crosses a threshold. Set them up once, and they run in the background.

## Create a monitor

1. Go to **Monitors** in the sidebar and click **+ New Monitor**
2. Select **Tracing Project Monitor**

<Frame>
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/observe/monitor_list.png" alt="Monitors page showing monitor list with statuses and New Monitor dropdown" />
</Frame>

3. Select a **project** to monitor
4. Choose a metric source:
   * **Span attribute** -- any attribute from your traces (latency, status, token counts, eval labels, custom metadata)
   * **Custom metric** -- any AQL metric you've defined
5. Configure the evaluation window and threshold (see below)
6. Set up notifications
7. Click **Create**

<Frame>
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/observe/monitor_creation.png" alt="Monitor creation form showing metric source, chart with threshold, and data configuration" />
</Frame>

### What can you monitor?

| What to track                 | How to set it up                                             |
| ----------------------------- | ------------------------------------------------------------ |
| Hallucination rate exceeds 5% | Metric source: custom metric                                 |
| Latency spike                 | Span attribute: latency\_ms, static threshold                |
| Token usage increase          | Span attribute: token count, automatic threshold             |
| Eval quality drops            | Span attribute: eval label (e.g., `eval.groundedness.label`) |

## Monitor settings

### Evaluation window

The time range of data the monitor evaluates each time it runs. For example, a 24-hour window means each check looks at the last 24 hours of data. Default is 24 hours for Tracing Project Monitors.

### Frequency

How often the monitor runs. Default is every hour. The monitor evaluates its window of data at this cadence. For **Schedule runtime** (including **5, 10, 15, or 30 minute** intervals), see [Schedule runtime](/ax/machine-learning/machine-learning/how-to-ml/monitors/configure-monitors#schedule-runtime).

### Delay window

Getting false alerts because your data arrives in batches? Set a delay window to give your system time to stabilize before the monitor evaluates.

### API-triggered monitors

Instead of running on a schedule, you can trigger a monitor manually via the GraphQL API. This is useful for evaluating after batch ingestion or a pipeline step:

```graphql theme={null}
mutation {
    triggerMonitor(input: {monitorId: "MONITOR_ID"}) {
        success
        monitor { name id }
    }
}
```

## Thresholds

Two ways to set when alerts fire:

* **Automatic** -- Arize sets a threshold based on your historical data. Adjust sensitivity (high = more alerts, low = fewer alerts).
* **Static** -- you set the exact number (e.g., "alert if accuracy drops below 85%")

You can set both upper and lower bounds for more precise alerting.

## Viewing a monitor

Once running, click into any monitor to see its alert history, threshold line, traffic, and evaluation window:

![Monitor detail view showing Hallucination rate alert history with threshold and evaluation window](https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/get-started-images/monitor_detail_view.png)

Your monitor is always in one of these states:

| Status               | Meaning                        |
| -------------------- | ------------------------------ |
| **Healthy** (green)  | Everything looks good          |
| **No Data** (yellow) | No recent data to evaluate     |
| **Triggered** (red)  | Threshold crossed, investigate |

## Notifications and integrations

A monitor is only useful if it reaches the right people. When a monitor triggers, get notified through:

* **Email** -- recipient doesn't need an Arize AX account
* **Slack** -- send to a team channel
* **PagerDuty** -- for on-call escalation
* **OpsGenie** -- for incident management
* **Webhooks** -- send to any HTTP endpoint

Configure integrations in **Alert Integrations** (left sidebar in Organization Settings) or from the **Config** tab within a project.

![Alert Integrations page showing available notification services](https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/get-started-images/alert_integration.png)

<Tabs>
  <Tab title="Slack">
    1. Go to **Alert Integrations** and click **Slack**
    2. Click **Connect to Slack**, select a channel, and click **Allow**
    3. Use **Test Integration** to verify the connection
    4. Assign to monitors via the **Config** tab or individual monitor settings

    Each Slack integration is tied to a specific channel. Add multiple for different teams.
  </Tab>

  <Tab title="PagerDuty">
    1. Go to **Alert Integrations** and click **PagerDuty**
    2. Use **Simple Installation**: click **Connect to PagerDuty**, log in, select services, click **Connect**
    3. Or **Manual Installation**: create an [API integration](https://developer.pagerduty.com/docs/ZG9jOjQ2NDA2-introduction) in PagerDuty, copy the key, enter it in Arize AX
    4. Assign to monitors and choose alert severity
  </Tab>

  <Tab title="OpsGenie">
    1. Go to **Alert Integrations** and click **OpsGenie**
    2. Create a [default API integration](https://support.atlassian.com/opsgenie/docs/create-a-default-api-integration/) in OpsGenie to get an API key
    3. Enter the API key in Arize AX (supports OpsGenie Team Keys only)
    4. Assign to monitors
  </Tab>

  <Tab title="Webhooks">
    Send alerts to any HTTP endpoint (your backend, ticketing system, custom Slack bot).

    1. Go to **Alert Integrations** and click **Webhook**
    2. Configure: name, HTTPS URL, and optional authorization header
    3. Subscribe to events: `monitor.triggered`, `monitor.cleared`, `monitor.no_data`
    4. Assign to monitors

    The webhook payload contains: monitor name and ID, current status, threshold and calculated metric values, and event timestamp.
  </Tab>
</Tabs>

<Info>
  You can also set up integrations programmatically using the [GraphQL API](/ax/graphql-reference).
</Info>

You can set alert destinations at two levels:

* **Project level** -- all monitors for a project send to the same destination (set in **Config** tab)
* **Monitor level** -- individual monitors send to different destinations (set in monitor settings)

## Additional features

* **Mute monitors** -- temporarily silence alerts during maintenance or known issues
* **Downtime windows** -- schedule recurring periods where alerts are suppressed
* **Duplicate monitors** -- copy an existing monitor's configuration to create a similar one
* **Audit log** -- track every change made to a monitor (who changed it, when, and what was modified)

## Best practices

* Start with a simple latency or token count monitor, then add more as you learn what matters
* Don't monitor everything. Focus on metrics tied to business outcomes or SLAs
* Start with automatic thresholds and adjust sensitivity over time
* Set a delay window if your data arrives in batches
* Don't change too many monitor settings at once. Adjust gradually

***

## You've completed the Observe workflow

You now have full observability for your LLM application. You can explore traces and sessions, understand agent behavior, define the metrics that matter, visualize them on dashboards, and get alerted the moment something goes wrong.

When a monitor fires, the loop restarts: go back to your traces, investigate the issue, understand the pattern, and take action. That's the Observe workflow.
