Skip to main content

Documentation Index

Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

What this enables: Run Arize AX experiment evaluations automatically as part of your Azure Pipelines — on every PR, on a schedule, or on-demand. Catch regressions in accuracy, latency, and cost before they hit production.

Key Concepts

  • Pipeline: An automated workflow defined in YAML, stored in your repo (typically azure-pipelines.yml at the root).
  • Stages → Jobs → Steps: A three-level hierarchy. A stage groups related work, a job runs on a single agent, and steps are the individual commands. Most simple pipelines use a single implicit stage with one job.
  • Variable Groups: Reusable, project-scoped collections of variables and secrets defined in Pipelines → Library. Linked to a pipeline via variables: - group:.
  • Service Connections: Azure DevOps integrations that authenticate to external systems — Git providers, Docker registries, Azure resources, secret managers.
  • Triggers: How pipelines get kicked off. CI pushes (trigger:) and PR validation (pr:) are separate top-level blocks; cron schedules live under schedules:.

Prerequisites & Assumptions

This guide assumes:
  • An Azure DevOps organization and project with Pipelines enabled. Your platform team has set up project access.
  • Your repository is connected. Either it lives in Azure Repos, or you’ve created a GitHub / Bitbucket / GitLab service connection so Azure Pipelines can clone it and post status checks back.
  • Microsoft-hosted agents are available (the default). Python 3.12 is preinstalled on the ubuntu-latest image. Self-hosted agent pools also work as long as Python 3.12+ is on the agent.
  • A Variable Group exists. Navigate to Pipelines → Library → + Variable group and create one named arize-experiments with ARIZE_API_KEY, ARIZE_SPACE_ID, ARIZE_DATASET_ID, and OPENAI_API_KEY. Click the lock icon next to each value to mark it as a secret. The pipeline YAML below references this group by name.
🔑 Secrets behave differently than in GitHub Actions. Azure Pipelines does not automatically inject secret variables into the step’s environment. You must explicitly map them via an env: block on each step that needs them, or they won’t be visible to your script. The example below shows the pattern.
Coming from Jenkins or GitHub Actions? Three things to know up front: (1) Azure DevOps uses a three-level hierarchy (stages → jobs → steps) rather than two-level, though small pipelines can omit stages. (2) Secret variables require explicit env: mapping per step (see callout above). (3) ubuntu-latest now resolves to Ubuntu 24.04 (the cutover happened in March 2025). Pin to ubuntu-24.04 explicitly if you want stability across future image rollovers. The Python script that runs your experiment is identical — no changes needed.

Setting Up Your First Experiment Pipeline

Create an azure-pipelines.yml

Place an azure-pipelines.yml at the root of your repository (or anywhere — you’ll point to it when creating the pipeline in the Azure DevOps UI). Then in Azure DevOps go to Pipelines → New pipeline, select your repo, and choose Existing Azure Pipelines YAML file.
trigger:
  branches:
    include:
      - main

pool:
  vmImage: ubuntu-latest

variables:
  - group: arize-experiments

steps:
  - task: UsePythonVersion@0
    displayName: Set Python version
    inputs:
      versionSpec: '3.12'

  - script: |
      pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
    displayName: Install dependencies

  - script: |
      python ./copilot/experiments/ai_search_test.py
    displayName: Run experiment
    env:
      ARIZE_API_KEY: $(ARIZE_API_KEY)
      ARIZE_SPACE_ID: $(ARIZE_SPACE_ID)
      ARIZE_DATASET_ID: $(ARIZE_DATASET_ID)
      OPENAI_API_KEY: $(OPENAI_API_KEY)

  - publish: experiment_results.json
    displayName: Publish results
    artifact: experiment-results
    condition: always()

Breakdown

  • trigger — CI trigger block. Fires when commits are pushed to main. PR-only pipelines drop this and use pr: instead.
  • pool.vmImage: ubuntu-latest — Runs on a Microsoft-hosted Linux agent. Currently maps to Ubuntu 24.04. Pin to ubuntu-24.04 explicitly if you want to avoid being moved by future Microsoft rollovers.
  • variables: - group: — Pulls in the arize-experiments Variable Group. Secret values from the group are masked in logs automatically; non-secrets become normal pipeline variables.
  • task: UsePythonVersion@0 — Selects the Python version. 3.12 is already on the image, but pinning here makes the choice explicit and survives future image changes.
  • script: — Shorthand for Bash@3 on Linux agents. Equivalent to sh in Jenkins or run: in GitHub Actions.
  • env: on the run step — The required mapping from Variable Group secrets to environment variables. Without this block your script can’t see ARIZE_API_KEY even though the Variable Group is loaded.
  • publish: — Stores experiment_results.json as a pipeline artifact. condition: always() keeps the artifact even when the script exits nonzero (useful when the experiment “fails” on a regression you want to inspect).
Self-hosted agent? Drop the vmImage line and use pool: name: <your-pool-name>. Make sure Python 3.12+ is on the agent or that UsePythonVersion@0 can install it (the task supports the Python tool installer on agents that allow downloads).

Trigger Options

Azure DevOps splits triggers across three top-level blocks: trigger: for CI pushes, pr: for PR validation, and schedules: for cron. Path filtering is pipeline-level on every trigger type — same posture as Harness, cleaner than Jenkins’ stage-level changeset.

1. Webhook (Pull Request)

The most common setup. Azure Pipelines runs the YAML on every PR open / update against a target branch, posts the status as a check, and blocks merging if you’ve configured branch policies to require it.
pr:
  branches:
    include:
      - main
      - release/*
  paths:
    include:
      - copilot/search/**
      - copilot/experiments/**
Path filters are pipeline-level. If nothing in copilot/search/** or copilot/experiments/** changed, the pipeline doesn’t start at all — no skipped stages, no no-op builds. This matches Harness payloadConditions and is stricter than Jenkins, which evaluates changeset after the build has already started.
PR triggers from GitHub. When the repo lives in GitHub (not Azure Repos), the pr: block in YAML is ignored — PR triggers must be configured in the GitHub side of the service connection. Azure Repos honors the YAML directly. Microsoft documents this gotcha here.

2. Webhook (CI / Push)

Fires on every push to a matching branch. Combine with paths: to scope tightly.
trigger:
  branches:
    include:
      - main
  paths:
    include:
      - copilot/search/**

3. Scheduled (Cron)

schedules:
  - cron: "0 0 * * *"
    displayName: Nightly experiment eval
    branches:
      include:
        - main
    always: true
always: true matters. Without it, a scheduled run only fires when there have been new commits since the last scheduled run. For nightly evals against a fixed dataset you almost always want it to run regardless.

4. Pipeline Chaining

Trigger this pipeline after another one finishes — useful when experiments should only run on a green build.
resources:
  pipelines:
    - pipeline: build
      source: my-app-build
      trigger:
        branches:
          include:
            - main

pool:
  vmImage: ubuntu-latest

steps:
  - script: python ./copilot/experiments/ai_search_test.py
    displayName: Run experiment

5. Manual or Parameterized Runs

Omit trigger: and pr: (or set trigger: none) to make the pipeline manual-only. Add parameters: to expose inputs in the Run pipeline dialog and the REST API.
trigger: none

parameters:
  - name: dataset_id
    displayName: Arize dataset
    type: string
    default: default-dataset
  - name: model
    displayName: Model to test
    type: string
    default: gpt-4o
    values:
      - gpt-4o
      - gpt-4o-mini
      - claude-sonnet-4-5-20250929

pool:
  vmImage: ubuntu-latest

steps:
  - script: |
      python ./copilot/experiments/ai_search_test.py \
        --dataset ${{ parameters.dataset_id }} \
        --model ${{ parameters.model }}
    displayName: Run experiment
You can kick off a parameterized run from the UI (Run pipeline button) or the REST API for programmatic invocation.

More Mature Patterns

Once the basics are working, these patterns become relevant as your experiment workflows grow.

Parallel Evaluation Runs

Run experiments against multiple models or datasets simultaneously using strategy.matrix:. Each leg gets its own job and Microsoft-hosted agent.
jobs:
  - job: evaluate
    pool:
      vmImage: ubuntu-latest
    strategy:
      matrix:
        gpt_4o:
          MODEL: gpt-4o
        gpt_4o_mini:
          MODEL: gpt-4o-mini
        claude_sonnet:
          MODEL: claude-sonnet-4-5-20250929
      maxParallel: 3
    steps:
      - task: UsePythonVersion@0
        inputs:
          versionSpec: '3.12'
      - script: |
          pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
          python ./experiments/eval.py --model $(MODEL)
        env:
          ARIZE_API_KEY: $(ARIZE_API_KEY)
          OPENAI_API_KEY: $(OPENAI_API_KEY)

Pipeline Templates

If multiple repos need the same experiment setup (install deps, configure credentials, run the script), extract it into a YAML template and reference it via extends:. Templates can live alongside the pipeline or in a dedicated repository surfaced through resources.repositories.
# templates/arize-experiment.yml
parameters:
  - name: script
    type: string

steps:
  - task: UsePythonVersion@0
    inputs:
      versionSpec: '3.12'
  - script: |
      pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
    displayName: Install dependencies
  - script: |
      python ${{ parameters.script }}
    displayName: Run experiment
    env:
      ARIZE_API_KEY: $(ARIZE_API_KEY)
      ARIZE_SPACE_ID: $(ARIZE_SPACE_ID)
      ARIZE_DATASET_ID: $(ARIZE_DATASET_ID)
      OPENAI_API_KEY: $(OPENAI_API_KEY)
# azure-pipelines.yml in a consuming repo
trigger:
  - main

pool:
  vmImage: ubuntu-latest

variables:
  - group: arize-experiments

steps:
  - template: templates/arize-experiment.yml
    parameters:
      script: ./copilot/experiments/ai_search_test.py

Variable Groups Linked to Azure Key Vault

For Azure-native orgs, link the Variable Group to an Azure Key Vault so secrets are managed centrally and rotated outside Azure DevOps. In Pipelines → Library → Variable group, toggle Link secrets from an Azure key vault and pick your subscription and vault. Only secret names are stored in the group; values are pulled from Key Vault at runtime. See Microsoft’s guide for the full setup. For ad-hoc fetches (a single secret, no Variable Group), use the AzureKeyVault@2 task directly:
- task: AzureKeyVault@2
  inputs:
    azureSubscription: my-azure-rm-connection
    KeyVaultName: my-vault
    SecretsFilter: 'ARIZE_API_KEY,OPENAI_API_KEY'
    RunAsPreJob: true
Workload Identity (OIDC) service connections are the modern way to authenticate the AzureKeyVault@2 task to Azure — no client secrets to rotate. Worth setting up if your org runs anything else on Azure. See Microsoft’s workload identity guide.

Environments and Approvals for Promotion Gates

Azure Pipelines Environments let you require manual approval before a stage runs. This pairs naturally with experiments-as-gates: run the experiment in one stage, gate prompt or model promotion in the next.
stages:
  - stage: Evaluate
    jobs:
      - job: run_experiment
        pool:
          vmImage: ubuntu-latest
        variables:
          - group: arize-experiments
        steps:
          - script: python ./copilot/experiments/ai_search_test.py
            env:
              ARIZE_API_KEY: $(ARIZE_API_KEY)

  - stage: Promote
    dependsOn: Evaluate
    condition: succeeded()
    jobs:
      - deployment: promote_prompt
        environment: production-prompts   # configure approvers in the Environment UI
        strategy:
          runOnce:
            deploy:
              steps:
                - script: ./scripts/promote_prompt.sh
Approvers are configured in the Environment (Pipelines → Environments → production-prompts → Approvals and checks), not in YAML — this keeps the approver list outside the repo and editable by platform owners without a code change.

Notifications

Azure DevOps has built-in Service Hooks for Slack and Microsoft Teams. Configure them at the project level under Project settings → Service hooks for a specific event (e.g., “Run state changed → Failed”) and pipeline. No YAML changes needed for the hook itself. For inline messaging from a step (richer payloads, custom routing) post directly to a webhook with curl:
- script: |
    curl -X POST -H 'Content-Type: application/json' \
      -d "{\"text\":\"Experiment failed: $(System.TeamFoundationCollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId)\"}" \
      "$(SLACK_WEBHOOK_URL)"
  condition: failed()
  displayName: Notify Slack on failure

PR Status Checks and Comments

When the pipeline is triggered by a PR, Azure DevOps automatically posts a status check back to the Git provider — same UX as GitHub Actions checks. For Azure Repos this is built-in; for GitHub repos it requires the GitHub service connection to have the right scopes. To post the experiment summary as an actual PR comment, use the Azure DevOps CLI for Azure Repos, or the GitHubComment@0 task for GitHub:
- task: GitHubComment@0
  condition: and(succeeded(), eq(variables['Build.Reason'], 'PullRequest'))
  inputs:
    gitHubConnection: my-github-connection
    repositoryName: $(Build.Repository.Name)
    comment: |
      ## Experiment Results
      Mean evaluator score: $(EVAL_MEAN_SCORE)
      [View run]($(System.TeamFoundationCollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId))
For Azure Repos, swap in:
- script: |
    az repos pr update --id $(System.PullRequest.PullRequestId) \
      --description "Experiment passed: $(EVAL_MEAN_SCORE)"
  env:
    AZURE_DEVOPS_EXT_PAT: $(System.AccessToken)