> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Jenkins CI/CD Basics

> Add automated LLM experiment evaluations to your existing Jenkins infrastructure using Arize AX.

> **What this enables:** Run Arize AX experiment evaluations automatically as part of your Jenkins pipelines — on every PR, on a schedule, or on-demand. Catch regressions in accuracy, latency, and cost before they hit production.

# Key Concepts

* **Pipeline**: An automated workflow defined in a `Jenkinsfile` (Groovy-based, not YAML).
* **Stages**: Named groups of work that run sequentially (e.g., Setup, Test, Report).
* **Steps**: Individual commands within a stage.
* **Agent**: Where the pipeline runs — a Jenkins node, a Docker container, or a Kubernetes pod.
* **Triggers**: How pipelines get kicked off — webhooks, cron schedules, or upstream jobs.

# Prerequisites & Assumptions

This guide assumes:

* **Jenkins is running on a recent LTS release** with Java 17+. See [Java support policy](https://www.jenkins.io/doc/administration/requirements/java/) for details.
* **A Jenkins agent** capable of running Docker containers (needed for the Python image approach below), or Python 3.12+ installed directly on the agent.
* **Your Jenkins instance can reach your Git provider** (GitHub, GitLab, Bitbucket) via webhook or polling.
* **Required plugins are installed:**
  * [Pipeline](https://plugins.jenkins.io/workflow-aggregator/) (usually included by default)
  * [Git](https://plugins.jenkins.io/git/)
  * [Credentials Binding](https://plugins.jenkins.io/credentials-binding/)
  * [Docker Pipeline](https://plugins.jenkins.io/docker-workflow/) (if using Docker agents)

> **🔑 Secrets setup:** Before your pipeline can run, store your API keys in **Jenkins → Manage Jenkins → Credentials**. Add `OPENAI_API_KEY`, `ARIZE_API_KEY`, `SPACE_ID`, and `DATASET_ID` as "Secret text" credentials. The `Jenkinsfile` below references these by their credential IDs.

# Setting Up Your First Experiment Pipeline

## Create a `Jenkinsfile`

Place a `Jenkinsfile` in the root of your repository. Jenkins uses a Groovy-based DSL (not YAML).

```groovy theme={null}
pipeline {
    agent {
        docker {
            image 'python:3.12'
        }
    }

    environment {
        OPENAI_API_KEY  = credentials('OPENAI_API_KEY')
        ARIZE_API_KEY   = credentials('ARIZE_API_KEY')
        SPACE_ID        = credentials('SPACE_ID')
        DATASET_ID      = credentials('DATASET_ID')
    }

    stages {
        stage('Install Dependencies') {
            steps {
                sh 'pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"'
            }
        }

        stage('Run Experiment') {
            steps {
                sh 'python ./copilot/experiments/ai_search_test.py'
            }
        }
    }

    post {
        always {
            archiveArtifacts artifacts: 'experiment_results.json', allowEmptyArchive: true
        }
    }
}
```

### Breakdown

* `pipeline { }` — Top-level block; everything lives inside this.
* `agent { docker { image '...' } }` — Runs the entire pipeline inside a Docker container. Jenkins pulls the image for you.
* `environment { }` — Injects secrets from the Jenkins credential store. The `credentials()` helper masks values in logs automatically.
* `stages` / `stage` — Sequential groups of work. Each stage appears as a separate column in the Pipeline Stage View.
* `steps` — Commands to execute. `sh` runs shell commands.
* `post { always { } }` — Runs after all stages complete (pass or fail). `archiveArtifacts` saves files to the Jenkins build page for download.

> **No Docker?** Replace the `agent` block with `agent any` and make sure Python 3.12+ is on your Jenkins node. You may also want to add a `sh 'python3 --version'` step to verify.

# Trigger Options

Unlike repo-hosted CI systems where triggers are defined entirely in the pipeline file, Jenkins separates *what runs* (`Jenkinsfile`) from *when it runs* (job configuration). Triggers can be set in the `Jenkinsfile` itself using the `triggers` directive, configured in the Jenkins UI, or driven by webhooks from your Git provider.

## 1. Webhook (Push / Pull Request)

The most common setup. Your Git provider sends a webhook to Jenkins when code changes.

**Setup:** Configure a webhook in your Git provider pointing to `https://<your-jenkins>/github-webhook/` (for GitHub) or the equivalent endpoint. Then in Jenkins, create a **Multibranch Pipeline** job pointing to your repo.

```groovy theme={null}
// No triggers block needed — Multibranch Pipeline jobs
// automatically build on push when webhooks are configured.
pipeline {
    agent any
    stages {
        stage('Test') {
            steps {
                sh 'echo "Triggered by push or PR"'
            }
        }
    }
}
```

> **Multibranch Pipeline** is the recommended job type for most teams. It automatically discovers branches and PRs in your repo and runs the `Jenkinsfile` found in each. No manual job creation per branch.

## 2. Scheduled (Cron)

```groovy theme={null}
pipeline {
    agent any
    triggers {
        // Jenkins cron: MINUTE HOUR DOM MONTH DOW
        cron('0 0 * * *')  // Every day at midnight
    }
    stages {
        stage('Nightly Eval') {
            steps {
                sh 'python ./copilot/experiments/ai_search_test.py'
            }
        }
    }
}
```

> **Cron syntax note:** Jenkins cron uses `H` (hash) for load distribution. `H 0 * * *` means "sometime in the midnight hour" — Jenkins picks a stable minute per job to avoid all jobs firing at :00. Use exact times only when it actually matters.

## 3. Polling SCM (Fallback When Webhooks Aren't Possible)

Jenkins periodically checks your repo for changes. Use this when your Jenkins instance isn't reachable from your Git provider (e.g., behind a firewall).

```groovy theme={null}
pipeline {
    agent any
    triggers {
        pollSCM('H/5 * * * *')  // Check every 5 minutes
    }
    stages {
        stage('Test') {
            steps {
                sh 'echo "Detected new changes"'
            }
        }
    }
}
```

## 4. Upstream Job (Pipeline Chaining)

Trigger one pipeline after another completes — useful for running evals only after a build passes.

```groovy theme={null}
pipeline {
    agent any
    triggers {
        upstream(upstreamProjects: 'my-build-job', threshold: hudson.model.Result.SUCCESS)
    }
    stages {
        stage('Post-Build Eval') {
            steps {
                sh 'python ./copilot/experiments/ai_search_test.py'
            }
        }
    }
}
```

## 5. Manual Only (No Automatic Trigger)

Omit the `triggers` block entirely. The pipeline runs only when someone clicks **Build Now** in the Jenkins UI or calls the Jenkins API.

```groovy theme={null}
pipeline {
    agent any
    // No triggers block — manual execution only
    stages {
        stage('On-Demand Eval') {
            steps {
                sh 'python ./copilot/experiments/ai_search_test.py'
            }
        }
    }
}
```

## 6. Parameterized Builds

Allow users to pass inputs when triggering a build — useful for running experiments against different datasets or models.

```groovy theme={null}
pipeline {
    agent any
    parameters {
        string(name: 'DATASET_ID', defaultValue: 'default-dataset', description: 'Arize dataset to evaluate against')
        choice(name: 'MODEL', choices: ['gpt-4o', 'gpt-4o-mini', 'claude-sonnet-4-5-20250929'], description: 'Model to test')
    }
    stages {
        stage('Run Experiment') {
            steps {
                sh "python ./copilot/experiments/ai_search_test.py --dataset ${params.DATASET_ID} --model ${params.MODEL}"
            }
        }
    }
}
```

# Scoping Pipelines to Specific File Changes

If you only want experiments to run when relevant code changes (prompt templates, retrieval logic, eval scripts), you can scope stages using a `changeset` condition. With Multibranch Pipeline jobs, every push triggers a build — this lets you skip the experiment stage when irrelevant files change:

```groovy theme={null}
pipeline {
    agent any
    stages {
        stage('Run Experiment') {
            when {
                changeset 'copilot/search/**'
            }
            steps {
                sh 'python ./copilot/experiments/ai_search_test.py'
            }
        }
    }
}
```

This still triggers the pipeline, but the stage is skipped if no files in `copilot/search/` changed. The build will show as successful (just with a skipped stage).

> **⚠️ Important distinction:** This is stage-level filtering, not pipeline-level. The pipeline still starts, checks out code, and evaluates the condition. For high-frequency repos, this can mean a lot of no-op builds. If that's a concern, look into the [Generic Webhook Trigger](https://plugins.jenkins.io/generic-webhook-trigger/) plugin, which can inspect the webhook payload before starting a build.

***

# More Mature Patterns

Once you have the basics working, here are patterns that become relevant as your experiment workflows grow.

## Parallel Evaluation Runs

Run experiments against multiple models or datasets simultaneously:

```groovy theme={null}
stage('Evaluate Models') {
    parallel {
        stage('GPT-4o') {
            steps {
                sh 'python ./experiments/eval.py --model gpt-4o'
            }
        }
        stage('Claude Sonnet') {
            steps {
                sh 'python ./experiments/eval.py --model claude-sonnet-4-5-20250929'
            }
        }
    }
}
```

## Shared Libraries

If multiple repos need the same experiment setup (install deps, configure credentials, post results to Arize AX), extract it into a [Jenkins Shared Library](https://www.jenkins.io/doc/book/pipeline/shared-libraries/):

```groovy theme={null}
// In your Jenkinsfile — after setting up the shared library in Jenkins config
@Library('arize-experiment-lib') _

pipeline {
    agent { docker { image 'python:3.12' } }
    stages {
        stage('Run') {
            steps {
                arizeExperiment(script: './copilot/experiments/ai_search_test.py')
            }
        }
    }
}
```

## Post Results as PR Comments

Use pipeline steps to post experiment results directly on the PR, so reviewers can see the impact of code changes without leaving their Git provider:

```groovy theme={null}
post {
    success {
        script {
            def results = readJSON file: 'experiment_results.json'
            // Post summary to PR (requires Git provider plugin + credentials)
            pullRequest.comment("## Experiment Results\n- Accuracy: ${results.accuracy}\n- Latency p50: ${results.latency_p50}ms")
        }
    }
}
```

## Notifications

```groovy theme={null}
post {
    failure {
        slackSend channel: '#ml-experiments', message: "❌ Experiment failed: ${env.BUILD_URL}"
    }
    success {
        slackSend channel: '#ml-experiments', message: "✅ Experiment passed: ${env.BUILD_URL}"
    }
}
```
