Skip to main content
What this enables: Run Arize AX experiment evaluations automatically as part of your Harness pipelines — on every PR, on a schedule, or on-demand. Catch regressions in accuracy, latency, and cost before they hit production.

Key Concepts

  • Pipeline: An automated workflow defined in YAML, stored in your repo (.harness/ folder) or inline in the Harness platform.
  • Stages: Named groups of work. A CI pipeline uses a CI (Build) stage type.
  • Steps: Individual tasks within a stage. The Run step executes shell commands or scripts.
  • Connectors: Harness integrations that connect to external systems — Git repos, Docker registries, secret managers.
  • Triggers: How pipelines get kicked off — webhooks, cron schedules, or manual execution.

Prerequisites & Assumptions

This guide assumes:
  • A Harness account with a CI project configured. Your platform team has set up the org, project, and user access.
  • A codebase connector pointing to your Git repo (GitHub, GitLab, Bitbucket). This is how Harness clones your code.
  • Build infrastructure is configured. Harness Cloud (managed runners), a self-managed VM, or a Kubernetes cluster. This guide uses Harness Cloud — Python is pre-installed on those runners.
  • Secrets are stored in Harness. Navigate to Project Settings → Secrets → New Secret → Secret Text and add ARIZE_API_KEY, ARIZE_SPACE_ID, and ARIZE_DATASET_ID. The pipeline YAML below references these by their secret IDs.
Coming from Jenkins? The main structural differences: pipeline config is YAML (not Groovy), triggers are defined in YAML or the Harness UI (not split across Jenkinsfile + job config), and secrets are referenced via <+secrets.getValue("ID")> (not credentials()). The Python script that runs your experiment is identical — no changes needed.

Setting Up Your First Experiment Pipeline

Create a Pipeline YAML

You can create pipelines in the Harness Visual Editor or YAML Editor. Here’s the YAML for a CI pipeline that runs an Arize AX experiment:
pipeline:
  name: LLM Experiment Evaluation
  identifier: llm_experiment_eval
  projectIdentifier: your_project
  orgIdentifier: your_org
  properties:
    ci:
      codebase:
        connectorRef: your_github_connector
        repoName: your-repo
        build: <+input>
  stages:
    - stage:
        name: Evaluate
        identifier: evaluate
        type: CI
        spec:
          cloneCodebase: true
          platform:
            os: Linux
            arch: Amd64
          runtime:
            type: Cloud
            spec: {}
          execution:
            steps:
              - step:
                  type: Run
                  name: Install Dependencies
                  identifier: install_deps
                  spec:
                    shell: Sh
                    command: pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
              - step:
                  type: Run
                  name: Run Experiment
                  identifier: run_experiment
                  spec:
                    shell: Sh
                    command: python ./copilot/experiments/ai_search_test.py
                    envVariables:
                      ARIZE_API_KEY: <+secrets.getValue("ARIZE_API_KEY")>
                      ARIZE_SPACE_ID: <+secrets.getValue("ARIZE_SPACE_ID")>
                      ARIZE_DATASET_ID: <+secrets.getValue("ARIZE_DATASET_ID")>

Breakdown

  • pipeline — Top-level block containing the full pipeline definition.
  • properties.ci.codebase — Tells Harness which repo to clone. connectorRef points to your Git connector. <+input> means the branch/PR is resolved at runtime.
  • stage.type: CI — Declares this as a CI (Build) stage.
  • runtime.type: Cloud — Runs on Harness Cloud managed infrastructure. Python is pre-installed.
  • steps — Sequential tasks. type: Run executes shell commands.
  • envVariables — Injects secrets as environment variables. <+secrets.getValue("...")> pulls from the Harness secret store and masks values in logs.
Using a Docker image instead? Replace the platform and runtime blocks with a connectorRef and image on the step. You’ll need a Docker registry connector configured in Harness.
- step:
    type: Run
    name: Run Experiment
    identifier: run_experiment
    spec:
      connectorRef: your_docker_connector
      image: python:3.12
      shell: Sh
      command: python ./copilot/experiments/ai_search_test.py
      envVariables:
        ARIZE_API_KEY: <+secrets.getValue("ARIZE_API_KEY")>

Trigger Options

Triggers in Harness can be configured in YAML or the Harness UI. They support webhooks, cron schedules, and manual execution.

1. Webhook (Pull Request)

The most common setup. Harness listens for webhook events from your Git provider and runs the pipeline when a PR is opened or updated. Setup: In your Harness project, go to Triggers → New Trigger → Webhook. Select your Git connector, choose the event type (Pull Request), and optionally add file path conditions. Triggers can also be defined in YAML:
trigger:
  name: PR to prompts
  identifier: pr_to_prompts
  enabled: true
  orgIdentifier: your_org
  projectIdentifier: your_project
  pipelineIdentifier: llm_experiment_eval
  source:
    type: Webhook
    spec:
      type: Github
      spec:
        type: PullRequest
        spec:
          connectorRef: your_github_connector
          autoAbortPreviousExecutions: true
          payloadConditions:
            - key: changedFiles
              operator: Contains
              value: copilot/
          actions:
            - Open
            - Synchronize
Path filtering is built-in. The changedFiles payload condition scopes the trigger to only fire when files in copilot/ are modified. This is pipeline-level filtering — unlike Jenkins’ stage-level changeset, the pipeline doesn’t start at all if the condition isn’t met.

2. Scheduled (Cron)

trigger:
  name: Nightly Eval
  identifier: nightly_eval
  enabled: true
  orgIdentifier: your_org
  projectIdentifier: your_project
  pipelineIdentifier: llm_experiment_eval
  source:
    type: Scheduled
    spec:
      type: Cron
      spec:
        expression: 0 0 * * *

3. Manual Only

No trigger needed. Run the pipeline from the Harness UI by selecting your pipeline and clicking Run. You can also trigger via the Harness API.

4. Pipeline Chaining

Use the Pipeline stage type to trigger one pipeline after another:
- stage:
    name: Run Evals After Build
    identifier: run_evals
    type: Pipeline
    spec:
      org: your_org
      pipeline: llm_experiment_eval
      project: your_project

More Mature Patterns

Once you have the basics working, here are patterns that become relevant as your experiment workflows grow.

Parallel Evaluation Runs

Use a strategy.parallelism or strategy.matrix to run experiments against multiple models simultaneously:
- step:
    type: Run
    name: Evaluate Model
    identifier: eval_model
    spec:
      shell: Sh
      command: python ./experiments/eval.py --model <+matrix.model>
    strategy:
      matrix:
        model:
          - gpt-4o
          - gpt-4o-mini
          - claude-sonnet-4-5-20250929
        maxConcurrency: 3

Pipeline Templates

If multiple repos need the same experiment setup, create a Harness Pipeline Template. Templates are versioned, reusable, and can be scoped to the project, org, or account level.
template:
  name: Arize Experiment Template
  identifier: arize_experiment_template
  type: Step
  spec:
    type: Run
    spec:
      shell: Sh
      command: |
        pip install -q arize arize-phoenix nest_asyncio packaging openai "gql[all]"
        python <+step.parameters.script>
      envVariables:
        ARIZE_API_KEY: <+secrets.getValue("ARIZE_API_KEY")>
        ARIZE_SPACE_ID: <+secrets.getValue("ARIZE_SPACE_ID")>
        ARIZE_DATASET_ID: <+secrets.getValue("ARIZE_DATASET_ID")>

Notifications

Harness has built-in notification rules at the pipeline level. Add a notificationRules block to your pipeline YAML:
notificationRules:
  - name: Experiment Failed
    enabled: true
    pipelineEvents:
      - type: StageFailed
        forStages:
          - evaluate
    notificationMethod:
      type: Slack
      spec:
        webhookUrl: https://hooks.slack.com/services/your/webhook/url

Status Checks on PRs

Harness can post build status back to your Git provider automatically. When using webhook triggers on PRs, the pipeline status (pass/fail) appears as a status check on the PR — similar to how GitHub Actions checks work. Configure this in your codebase connector settings by enabling Send status to Git provider.