> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure DevOps CI/CD Basics

> Add automated LLM experiment evaluations to your Azure DevOps Pipelines using Arize AX.

> **What this enables:** Run Arize AX experiment evaluations automatically as part of your Azure Pipelines — on every PR, on a schedule, or on-demand. Catch regressions in accuracy, latency, and cost before they hit production.

## Key Concepts

* **Pipeline**: An automated workflow defined in YAML, stored in your repo (typically `azure-pipelines.yml` at the root).
* **Stages → Jobs → Steps**: A three-level hierarchy. A `stage` groups related work, a `job` runs on a single agent, and `steps` are the individual commands. Most simple pipelines use a single implicit stage with one job.
* **Variable Groups**: Reusable, project-scoped collections of variables and secrets defined in **Pipelines → Library**. Linked to a pipeline via `variables: - group:`.
* **Service Connections**: Azure DevOps integrations that authenticate to external systems — Git providers, Docker registries, Azure resources, secret managers.
* **Triggers**: How pipelines get kicked off. CI pushes (`trigger:`) and PR validation (`pr:`) are separate top-level blocks; cron schedules live under `schedules:`.

## Prerequisites & Assumptions

This guide assumes:

* **An Azure DevOps organization and project** with Pipelines enabled. Your platform team has set up project access.
* **Your repository is connected.** Either it lives in Azure Repos, or you've created a [GitHub / Bitbucket / GitLab service connection](https://learn.microsoft.com/azure/devops/pipelines/library/service-endpoints) so Azure Pipelines can clone it and post status checks back.
* **Microsoft-hosted agents are available** (the default). Python 3.12 is preinstalled on the `ubuntu-latest` image. Self-hosted agent pools also work as long as Python 3.12+ is on the agent.
* **A Variable Group exists.** Navigate to **Pipelines → Library → + Variable group** and create one named `arize-experiments` with `ARIZE_API_KEY`, `ARIZE_SPACE_ID`, `ARIZE_DATASET_ID`, and `OPENAI_API_KEY`. Click the lock icon next to each value to mark it as a secret. The pipeline YAML below references this group by name.

> **🔑 Secrets behave differently than in GitHub Actions.** Azure Pipelines does *not* automatically inject secret variables into the step's environment. You must explicitly map them via an `env:` block on each step that needs them, or they won't be visible to your script. The example below shows the pattern.

> **Coming from Jenkins or GitHub Actions?** Three things to know up front: (1) Azure DevOps uses a three-level hierarchy (`stages → jobs → steps`) rather than two-level, though small pipelines can omit `stages`. (2) Secret variables require explicit `env:` mapping per step (see callout above). (3) `ubuntu-latest` now resolves to **Ubuntu 24.04** (the cutover happened in March 2025). Pin to `ubuntu-24.04` explicitly if you want stability across future image rollovers. The Python script that runs your experiment is identical — no changes needed.

## Setting Up Your First Experiment Pipeline

### Create an `azure-pipelines.yml`

Place an `azure-pipelines.yml` at the root of your repository (or anywhere — you'll point to it when creating the pipeline in the Azure DevOps UI). Then in Azure DevOps go to **Pipelines → New pipeline**, select your repo, and choose **Existing Azure Pipelines YAML file**.

```yaml theme={null}
trigger:
  branches:
    include:
      - main

pool:
  vmImage: ubuntu-latest

variables:
  - group: arize-experiments

steps:
  - task: UsePythonVersion@0
    displayName: Set Python version
    inputs:
      versionSpec: '3.12'

  - script: |
      pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
    displayName: Install dependencies

  - script: |
      python ./copilot/experiments/ai_search_test.py
    displayName: Run experiment
    env:
      ARIZE_API_KEY: $(ARIZE_API_KEY)
      ARIZE_SPACE_ID: $(ARIZE_SPACE_ID)
      ARIZE_DATASET_ID: $(ARIZE_DATASET_ID)
      OPENAI_API_KEY: $(OPENAI_API_KEY)

  - publish: experiment_results.json
    displayName: Publish results
    artifact: experiment-results
    condition: always()
```

#### Breakdown

* `trigger` — CI trigger block. Fires when commits are pushed to `main`. PR-only pipelines drop this and use `pr:` instead.
* `pool.vmImage: ubuntu-latest` — Runs on a Microsoft-hosted Linux agent. Currently maps to Ubuntu 24.04. Pin to `ubuntu-24.04` explicitly if you want to avoid being moved by future Microsoft rollovers.
* `variables: - group:` — Pulls in the `arize-experiments` Variable Group. Secret values from the group are masked in logs automatically; non-secrets become normal pipeline variables.
* `task: UsePythonVersion@0` — Selects the Python version. 3.12 is already on the image, but pinning here makes the choice explicit and survives future image changes.
* `script:` — Shorthand for `Bash@3` on Linux agents. Equivalent to `sh` in Jenkins or `run:` in GitHub Actions.
* `env:` on the run step — The required mapping from Variable Group secrets to environment variables. Without this block your script can't see `ARIZE_API_KEY` even though the Variable Group is loaded.
* `publish:` — Stores `experiment_results.json` as a pipeline artifact. `condition: always()` keeps the artifact even when the script exits nonzero (useful when the experiment "fails" on a regression you want to inspect).

> **Self-hosted agent?** Drop the `vmImage` line and use `pool: name: <your-pool-name>`. Make sure Python 3.12+ is on the agent or that `UsePythonVersion@0` can install it (the task supports the [Python tool installer](https://learn.microsoft.com/azure/devops/pipelines/tasks/reference/use-python-version-v0) on agents that allow downloads).

## Trigger Options

Azure DevOps splits triggers across three top-level blocks: `trigger:` for CI pushes, `pr:` for PR validation, and `schedules:` for cron. Path filtering is pipeline-level on every trigger type — same posture as Harness, cleaner than Jenkins' stage-level `changeset`.

### 1. Webhook (Pull Request)

The most common setup. Azure Pipelines runs the YAML on every PR open / update against a target branch, posts the status as a check, and blocks merging if you've configured branch policies to require it.

```yaml theme={null}
pr:
  branches:
    include:
      - main
      - release/*
  paths:
    include:
      - copilot/search/**
      - copilot/experiments/**
```

> **Path filters are pipeline-level.** If nothing in `copilot/search/**` or `copilot/experiments/**` changed, the pipeline doesn't start at all — no skipped stages, no no-op builds. This matches Harness `payloadConditions` and is stricter than Jenkins, which evaluates `changeset` after the build has already started.

> **PR triggers from GitHub.** When the repo lives in GitHub (not Azure Repos), the `pr:` block in YAML is ignored — PR triggers must be configured in the GitHub side of the service connection. Azure Repos honors the YAML directly. Microsoft documents this gotcha [here](https://learn.microsoft.com/azure/devops/pipelines/repos/github#pr-triggers).

### 2. Webhook (CI / Push)

Fires on every push to a matching branch. Combine with `paths:` to scope tightly.

```yaml theme={null}
trigger:
  branches:
    include:
      - main
  paths:
    include:
      - copilot/search/**
```

### 3. Scheduled (Cron)

```yaml theme={null}
schedules:
  - cron: "0 0 * * *"
    displayName: Nightly experiment eval
    branches:
      include:
        - main
    always: true
```

> **`always: true` matters.** Without it, a scheduled run only fires when there have been new commits since the last scheduled run. For nightly evals against a fixed dataset you almost always want it to run regardless.

### 4. Pipeline Chaining

Trigger this pipeline after another one finishes — useful when experiments should only run on a green build.

```yaml theme={null}
resources:
  pipelines:
    - pipeline: build
      source: my-app-build
      trigger:
        branches:
          include:
            - main

pool:
  vmImage: ubuntu-latest

steps:
  - script: python ./copilot/experiments/ai_search_test.py
    displayName: Run experiment
```

### 5. Manual or Parameterized Runs

Omit `trigger:` and `pr:` (or set `trigger: none`) to make the pipeline manual-only. Add `parameters:` to expose inputs in the **Run pipeline** dialog and the REST API.

```yaml theme={null}
trigger: none

parameters:
  - name: dataset_id
    displayName: Arize dataset
    type: string
    default: default-dataset
  - name: model
    displayName: Model to test
    type: string
    default: gpt-4o
    values:
      - gpt-4o
      - gpt-4o-mini
      - claude-sonnet-4-5-20250929

pool:
  vmImage: ubuntu-latest

steps:
  - script: |
      python ./copilot/experiments/ai_search_test.py \
        --dataset ${{ parameters.dataset_id }} \
        --model ${{ parameters.model }}
    displayName: Run experiment
```

You can kick off a parameterized run from the UI (**Run pipeline** button) or the [REST API](https://learn.microsoft.com/rest/api/azure/devops/pipelines/runs/run-pipeline) for programmatic invocation.

***

## More Mature Patterns

Once the basics are working, these patterns become relevant as your experiment workflows grow.

### Parallel Evaluation Runs

Run experiments against multiple models or datasets simultaneously using `strategy.matrix:`. Each leg gets its own job and Microsoft-hosted agent.

```yaml theme={null}
jobs:
  - job: evaluate
    pool:
      vmImage: ubuntu-latest
    strategy:
      matrix:
        gpt_4o:
          MODEL: gpt-4o
        gpt_4o_mini:
          MODEL: gpt-4o-mini
        claude_sonnet:
          MODEL: claude-sonnet-4-5-20250929
      maxParallel: 3
    steps:
      - task: UsePythonVersion@0
        inputs:
          versionSpec: '3.12'
      - script: |
          pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
          python ./experiments/eval.py --model $(MODEL)
        env:
          ARIZE_API_KEY: $(ARIZE_API_KEY)
          OPENAI_API_KEY: $(OPENAI_API_KEY)
```

### Pipeline Templates

If multiple repos need the same experiment setup (install deps, configure credentials, run the script), extract it into a [YAML template](https://learn.microsoft.com/azure/devops/pipelines/process/templates) and reference it via `extends:`. Templates can live alongside the pipeline or in a dedicated repository surfaced through `resources.repositories`.

```yaml theme={null}
# templates/arize-experiment.yml
parameters:
  - name: script
    type: string

steps:
  - task: UsePythonVersion@0
    inputs:
      versionSpec: '3.12'
  - script: |
      pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
    displayName: Install dependencies
  - script: |
      python ${{ parameters.script }}
    displayName: Run experiment
    env:
      ARIZE_API_KEY: $(ARIZE_API_KEY)
      ARIZE_SPACE_ID: $(ARIZE_SPACE_ID)
      ARIZE_DATASET_ID: $(ARIZE_DATASET_ID)
      OPENAI_API_KEY: $(OPENAI_API_KEY)
```

```yaml theme={null}
# azure-pipelines.yml in a consuming repo
trigger:
  - main

pool:
  vmImage: ubuntu-latest

variables:
  - group: arize-experiments

steps:
  - template: templates/arize-experiment.yml
    parameters:
      script: ./copilot/experiments/ai_search_test.py
```

### Variable Groups Linked to Azure Key Vault

For Azure-native orgs, link the Variable Group to an Azure Key Vault so secrets are managed centrally and rotated outside Azure DevOps.

In **Pipelines → Library → Variable group**, toggle **Link secrets from an Azure key vault** and pick your subscription and vault. Only secret *names* are stored in the group; values are pulled from Key Vault at runtime. See [Microsoft's guide](https://learn.microsoft.com/azure/devops/pipelines/library/link-variable-groups-to-key-vaults) for the full setup.

For ad-hoc fetches (a single secret, no Variable Group), use the `AzureKeyVault@2` task directly:

```yaml theme={null}
- task: AzureKeyVault@2
  inputs:
    azureSubscription: my-azure-rm-connection
    KeyVaultName: my-vault
    SecretsFilter: 'ARIZE_API_KEY,OPENAI_API_KEY'
    RunAsPreJob: true
```

> **Workload Identity (OIDC) service connections** are the modern way to authenticate the `AzureKeyVault@2` task to Azure — no client secrets to rotate. Worth setting up if your org runs anything else on Azure. See [Microsoft's workload identity guide](https://learn.microsoft.com/azure/devops/pipelines/release/configure-workload-identity).

### Environments and Approvals for Promotion Gates

Azure Pipelines [Environments](https://learn.microsoft.com/azure/devops/pipelines/process/environments) let you require manual approval before a stage runs. This pairs naturally with experiments-as-gates: run the experiment in one stage, gate prompt or model promotion in the next.

```yaml theme={null}
stages:
  - stage: Evaluate
    jobs:
      - job: run_experiment
        pool:
          vmImage: ubuntu-latest
        variables:
          - group: arize-experiments
        steps:
          - script: python ./copilot/experiments/ai_search_test.py
            env:
              ARIZE_API_KEY: $(ARIZE_API_KEY)

  - stage: Promote
    dependsOn: Evaluate
    condition: succeeded()
    jobs:
      - deployment: promote_prompt
        environment: production-prompts   # configure approvers in the Environment UI
        strategy:
          runOnce:
            deploy:
              steps:
                - script: ./scripts/promote_prompt.sh
```

Approvers are configured in the Environment (Pipelines → Environments → `production-prompts` → Approvals and checks), not in YAML — this keeps the approver list outside the repo and editable by platform owners without a code change.

### Notifications

Azure DevOps has built-in [Service Hooks](https://learn.microsoft.com/azure/devops/service-hooks/services/teams) for Slack and Microsoft Teams. Configure them at the project level under **Project settings → Service hooks** for a specific event (e.g., "Run state changed → Failed") and pipeline. No YAML changes needed for the hook itself.

For inline messaging from a step (richer payloads, custom routing) post directly to a webhook with `curl`:

```yaml theme={null}
- script: |
    curl -X POST -H 'Content-Type: application/json' \
      -d "{\"text\":\"Experiment failed: $(System.TeamFoundationCollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId)\"}" \
      "$(SLACK_WEBHOOK_URL)"
  condition: failed()
  displayName: Notify Slack on failure
```

### PR Status Checks and Comments

When the pipeline is triggered by a PR, Azure DevOps automatically posts a status check back to the Git provider — same UX as GitHub Actions checks. For Azure Repos this is built-in; for GitHub repos it requires the GitHub service connection to have the right scopes.

To post the experiment summary as an actual PR comment, use the [Azure DevOps CLI](https://learn.microsoft.com/cli/azure/repos/pr) for Azure Repos, or the [`GitHubComment@0`](https://learn.microsoft.com/azure/devops/pipelines/tasks/reference/git-hub-comment-v0) task for GitHub:

```yaml theme={null}
- task: GitHubComment@0
  condition: and(succeeded(), eq(variables['Build.Reason'], 'PullRequest'))
  inputs:
    gitHubConnection: my-github-connection
    repositoryName: $(Build.Repository.Name)
    comment: |
      ## Experiment Results
      Mean evaluator score: $(EVAL_MEAN_SCORE)
      [View run]($(System.TeamFoundationCollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId))
```

For Azure Repos, swap in:

```yaml theme={null}
- script: |
    az repos pr update --id $(System.PullRequest.PullRequestId) \
      --description "Experiment passed: $(EVAL_MEAN_SCORE)"
  env:
    AZURE_DEVOPS_EXT_PAT: $(System.AccessToken)
```
