This is Part 5 of the Arize AX Get Started series. You should have completed the Experiments guide first, but this guide can also stand alone if you already have traces and evaluations flowing into AX.
Step 1: Create a dashboard
Dashboards centralize the metrics you care about into a single view. Let’s create one for the SkyServe chatbot.- Navigate to Dashboards in the left sidebar
- Click + New Dashboard
- Name it
SkyServe Production Health

Add widgets
AX offers several widget types. Add the following to get a useful overview: Timeseries — Request Volume Track how many requests your chatbot handles over time. Sudden drops could mean an outage; spikes could cause latency issues.- Widget type: Timeseries
- Metric: Span count
- Project: skyserve-chatbot
- Widget type: Timeseries
- Metric: Average latency
- Project: skyserve-chatbot
- Widget type: Distribution
- Metric: Evaluation label distribution for
groundedness-check - Project: skyserve-chatbot
- Widget type: Statistic
- Metric: Error count
- Project: skyserve-chatbot

Step 2: Set up managed monitors
Monitors run in the background and alert you when a metric crosses a threshold. Managed monitors let you enable common checks with one click.- Navigate to Monitors in the left sidebar
- You’ll see the managed monitors section — click to enable:
- Latency Monitor — tracks latency (span duration) for your project
- Token Count Monitor — tracks token usage (prompt + completion) for your project

- Latency: Alert if average latency exceeds 2000ms over a 1-hour window
- Error count: Alert if errors exceed 10 in a 1-hour window
Step 3: Create a quality monitor
Managed monitors catch infrastructure-level problems (latency, errors), but the most valuable monitor for an LLM app is one that tracks output quality. Let’s create a custom monitor that alerts you when your groundedness scores drop.- Click + New Monitor
- Configure it:
- Metric source: Span attribute
- Attribute:
eval.groundedness-check.label - Metric: Count
- Set your alerting threshold in the Define the Alerting section
- Click Create

Step 4: Configure alert destinations
Monitors are only useful if they reach the right people. Configure where alerts get sent:- Go to Settings → Alert Integrations
- Add your preferred destination:
- Slack — send alerts to a team channel
- PagerDuty — for on-call escalation
- Opsgenie — for incident management

Step 5: Respond to an alert (the full workflow)
Here’s what the full production workflow looks like when you get an alert:- Alert fires: “Groundedness scores dropped below threshold in the past hour”
- Check the dashboard: Open your SkyServe Production Health dashboard. Is this a broad quality drop or just a spike? Is latency also affected?
- Drill into traces: Click through from the dashboard to the traces view. Filter by traces that failed the groundedness check in the last hour. What do they have in common?
- Identify the pattern: Maybe a new type of question is coming in that your prompt doesn’t handle well. Or maybe the retrieval step is returning wrong documents for certain queries.
- Fix it: Open a failing trace in the Prompt Playground, iterate on the prompt, save to Prompt Hub
- Validate: Run an experiment against your test dataset to make sure the fix works without regressions
- Deploy: Your app pulls the updated prompt from Prompt Hub automatically
Congratulations!
You’ve set up a complete production safety net for your LLM application:- Dashboards give you an at-a-glance view of system health
- Managed monitors catch infrastructure issues (latency, errors, token usage)
- Quality monitors alert you when your evaluation scores drop
- Alert destinations make sure the right people are notified
What you’ve accomplished in this series
Across these five guides, you’ve gone from a black-box chatbot to a fully observable, continuously evaluated, and proactively monitored application:| Guide | What you built | Value unlocked |
|---|---|---|
| Tracing | Full request visibility | See exactly what’s happening inside your app |
| Evaluations | Automated quality scoring | Measure quality at scale, no manual review |
| Prompts | Data-driven prompt iteration | Fix problems using real data, with version control |
| Experiments | Controlled testing | Prove changes work before deploying |
| Dashboards & Monitors | Production safety net | Know the moment something goes wrong |