Skip to main content
You’ve built, tested, and improved your chatbot. It’s deployed and serving real airline customers. But LLM apps can degrade in surprising ways: your model provider might change behavior in an update, a new type of customer question starts coming in that your prompt doesn’t handle well, or a spike in traffic causes latency to climb. You need to know the moment quality drops — not three days later when a VP forwards you a customer complaint. Dashboards give you an at-a-glance view of your app’s health. Monitors watch your metrics continuously and alert you when something crosses a threshold. Together, they’re your production safety net.
This is Part 5 of the Arize AX Get Started series. You should have completed the Experiments guide first, but this guide can also stand alone if you already have traces and evaluations flowing into AX.

Step 1: Create a dashboard

Dashboards centralize the metrics you care about into a single view. Let’s create one for the SkyServe chatbot.
  1. Navigate to Dashboards in the left sidebar
  2. Click + New Dashboard
  3. Name it SkyServe Production Health
Create Dashboard dialog with name and tracing project selected

Add widgets

AX offers several widget types. Add the following to get a useful overview: Timeseries — Request Volume Track how many requests your chatbot handles over time. Sudden drops could mean an outage; spikes could cause latency issues.
  • Widget type: Timeseries
  • Metric: Span count
  • Project: skyserve-chatbot
Timeseries — Average Latency Monitor response times. LLM latency can vary significantly, and slow responses frustrate users.
  • Widget type: Timeseries
  • Metric: Average latency
  • Project: skyserve-chatbot
Distribution — Evaluation Scores See the distribution of your groundedness evaluation scores. A healthy app should show most responses passing.
  • Widget type: Distribution
  • Metric: Evaluation label distribution for groundedness-check
  • Project: skyserve-chatbot
Statistic — Error Rate A single number showing what percentage of requests are erroring out.
  • Widget type: Statistic
  • Metric: Error count
  • Project: skyserve-chatbot
SkyServe Production Health dashboard with traces, token costs, and eval score widgets
Your dashboard now gives you a single place to check on your app’s health. Use the date selector at the top to zoom into specific time ranges, and click on any data point to drill down to the underlying traces.

Step 2: Set up managed monitors

Monitors run in the background and alert you when a metric crosses a threshold. Managed monitors let you enable common checks with one click.
  1. Navigate to Monitors in the left sidebar
  2. You’ll see the managed monitors section — click to enable:
    • Latency Monitor — tracks latency (span duration) for your project
    • Token Count Monitor — tracks token usage (prompt + completion) for your project
Monitors page showing Latency Monitor and Token Count Monitor options
For each monitor, set a threshold that makes sense for your app. For example:
  • Latency: Alert if average latency exceeds 2000ms over a 1-hour window
  • Error count: Alert if errors exceed 10 in a 1-hour window

Step 3: Create a quality monitor

Managed monitors catch infrastructure-level problems (latency, errors), but the most valuable monitor for an LLM app is one that tracks output quality. Let’s create a custom monitor that alerts you when your groundedness scores drop.
  1. Click + New Monitor
  2. Configure it:
    • Metric source: Span attribute
    • Attribute: eval.groundedness-check.label
    • Metric: Count
  3. Set your alerting threshold in the Define the Alerting section
  4. Click Create
Eval monitor creation form with groundedness-check label metric and alerting threshold
This is the monitor that ties everything together. Your evaluations run continuously on production data (from the Evaluations guide), and this monitor watches those evaluation results. If the chatbot starts hallucinating more than usual — for any reason — you’ll know.

Step 4: Configure alert destinations

Monitors are only useful if they reach the right people. Configure where alerts get sent:
  1. Go to SettingsAlert Integrations
  2. Add your preferred destination:
    • Slack — send alerts to a team channel
    • PagerDuty — for on-call escalation
    • Opsgenie — for incident management
Alert Integrations showing PagerDuty, Opsgenie, and Slack options

Step 5: Respond to an alert (the full workflow)

Here’s what the full production workflow looks like when you get an alert:
  1. Alert fires: “Groundedness scores dropped below threshold in the past hour”
  2. Check the dashboard: Open your SkyServe Production Health dashboard. Is this a broad quality drop or just a spike? Is latency also affected?
  3. Drill into traces: Click through from the dashboard to the traces view. Filter by traces that failed the groundedness check in the last hour. What do they have in common?
  4. Identify the pattern: Maybe a new type of question is coming in that your prompt doesn’t handle well. Or maybe the retrieval step is returning wrong documents for certain queries.
  5. Fix it: Open a failing trace in the Prompt Playground, iterate on the prompt, save to Prompt Hub
  6. Validate: Run an experiment against your test dataset to make sure the fix works without regressions
  7. Deploy: Your app pulls the updated prompt from Prompt Hub automatically
This is the full development loop you’ve learned across this series, now triggered by a production alert instead of manual review.

Congratulations!

You’ve set up a complete production safety net for your LLM application:
  • Dashboards give you an at-a-glance view of system health
  • Managed monitors catch infrastructure issues (latency, errors, token usage)
  • Quality monitors alert you when your evaluation scores drop
  • Alert destinations make sure the right people are notified
And when an alert fires, you already know the workflow: trace → evaluate → improve → experiment → deploy. You’ve built a complete, repeatable system for building and operating LLM applications.

What you’ve accomplished in this series

Across these five guides, you’ve gone from a black-box chatbot to a fully observable, continuously evaluated, and proactively monitored application:
GuideWhat you builtValue unlocked
TracingFull request visibilitySee exactly what’s happening inside your app
EvaluationsAutomated quality scoringMeasure quality at scale, no manual review
PromptsData-driven prompt iterationFix problems using real data, with version control
ExperimentsControlled testingProve changes work before deploying
Dashboards & MonitorsProduction safety netKnow the moment something goes wrong
Welcome to Arize AX!

Keep going

Tracing deep dive

Advanced evaluations

CI/CD with Experiments