What this enables: Run Arize AX experiment evaluations automatically as part of your Jenkins pipelines — on every PR, on a schedule, or on-demand. Catch regressions in accuracy, latency, and cost before they hit production.
Key Concepts
- Pipeline: An automated workflow defined in a
Jenkinsfile(Groovy-based, not YAML). - Stages: Named groups of work that run sequentially (e.g., Setup, Test, Report).
- Steps: Individual commands within a stage.
- Agent: Where the pipeline runs — a Jenkins node, a Docker container, or a Kubernetes pod.
- Triggers: How pipelines get kicked off — webhooks, cron schedules, or upstream jobs.
Prerequisites & Assumptions
This guide assumes:- Jenkins is running on a recent LTS release with Java 17+. See Java support policy for details.
- A Jenkins agent capable of running Docker containers (needed for the Python image approach below), or Python 3.12+ installed directly on the agent.
- Your Jenkins instance can reach your Git provider (GitHub, GitLab, Bitbucket) via webhook or polling.
- Required plugins are installed:
- Pipeline (usually included by default)
- Git
- Credentials Binding
- Docker Pipeline (if using Docker agents)
🔑 Secrets setup: Before your pipeline can run, store your API keys in Jenkins → Manage Jenkins → Credentials. AddOPENAI_API_KEY,ARIZE_API_KEY,SPACE_ID, andDATASET_IDas “Secret text” credentials. TheJenkinsfilebelow references these by their credential IDs.
Setting Up Your First Experiment Pipeline
Create a Jenkinsfile
Place a Jenkinsfile in the root of your repository. Jenkins uses a Groovy-based DSL (not YAML).
Breakdown
pipeline { }— Top-level block; everything lives inside this.agent { docker { image '...' } }— Runs the entire pipeline inside a Docker container. Jenkins pulls the image for you.environment { }— Injects secrets from the Jenkins credential store. Thecredentials()helper masks values in logs automatically.stages/stage— Sequential groups of work. Each stage appears as a separate column in the Pipeline Stage View.steps— Commands to execute.shruns shell commands.post { always { } }— Runs after all stages complete (pass or fail).archiveArtifactssaves files to the Jenkins build page for download.
No Docker? Replace theagentblock withagent anyand make sure Python 3.12+ is on your Jenkins node. You may also want to add ash 'python3 --version'step to verify.
Trigger Options
Unlike repo-hosted CI systems where triggers are defined entirely in the pipeline file, Jenkins separates what runs (Jenkinsfile) from when it runs (job configuration). Triggers can be set in the Jenkinsfile itself using the triggers directive, configured in the Jenkins UI, or driven by webhooks from your Git provider.
1. Webhook (Push / Pull Request)
The most common setup. Your Git provider sends a webhook to Jenkins when code changes. Setup: Configure a webhook in your Git provider pointing tohttps://<your-jenkins>/github-webhook/ (for GitHub) or the equivalent endpoint. Then in Jenkins, create a Multibranch Pipeline job pointing to your repo.
Multibranch Pipeline is the recommended job type for most teams. It automatically discovers branches and PRs in your repo and runs the Jenkinsfile found in each. No manual job creation per branch.
2. Scheduled (Cron)
Cron syntax note: Jenkins cron usesH(hash) for load distribution.H 0 * * *means “sometime in the midnight hour” — Jenkins picks a stable minute per job to avoid all jobs firing at :00. Use exact times only when it actually matters.
3. Polling SCM (Fallback When Webhooks Aren’t Possible)
Jenkins periodically checks your repo for changes. Use this when your Jenkins instance isn’t reachable from your Git provider (e.g., behind a firewall).4. Upstream Job (Pipeline Chaining)
Trigger one pipeline after another completes — useful for running evals only after a build passes.5. Manual Only (No Automatic Trigger)
Omit thetriggers block entirely. The pipeline runs only when someone clicks Build Now in the Jenkins UI or calls the Jenkins API.
6. Parameterized Builds
Allow users to pass inputs when triggering a build — useful for running experiments against different datasets or models.Scoping Pipelines to Specific File Changes
If you only want experiments to run when relevant code changes (prompt templates, retrieval logic, eval scripts), you can scope stages using achangeset condition. With Multibranch Pipeline jobs, every push triggers a build — this lets you skip the experiment stage when irrelevant files change:
copilot/search/ changed. The build will show as successful (just with a skipped stage).
⚠️ Important distinction: This is stage-level filtering, not pipeline-level. The pipeline still starts, checks out code, and evaluates the condition. For high-frequency repos, this can mean a lot of no-op builds. If that’s a concern, look into the Generic Webhook Trigger plugin, which can inspect the webhook payload before starting a build.