Skip to main content

Overview

Use this page when Kubernetes is already running and is managed by Rancher. Rancher can manage clusters across cloud, private cloud, and bare-metal infrastructure; this guide starts from an existing Rancher-managed Kubernetes cluster and does not cover provisioning Rancher or the underlying hosts. Your platform team owns the cluster-level choices: storage classes, ingress, DNS, registry access, and object storage. The Arize AX install flow is still the same: point kubectl at the target cluster, create values.yaml next to arize.sh, then run ./arize.sh install from the extracted distribution directory.

Before you start

You need:
  • A working kubeconfig for the target cluster.
  • Block-storage-backed storage classes for Arize AX persistent volumes. Do not use NFS-backed storage classes.
  • Object storage for Gazette and ArizeDB data. Use Ceph or another S3-compatible service managed by your platform team.
  • A storage class name for standard persistent volumes and SSD-style persistent volumes. They can be the same storage class if the cluster does not split storage tiers.
  • The application URL you plan to expose Arize AX at, for appBaseUrl. DNS does not need to resolve at install time; you can wire DNS and ingress afterwards. Pick the URL you intend to keep, though: OAuth callbacks, application redirects, and the configmap rendered by the operator all reference appBaseUrl, so changing it later means re-rendering the install configuration.
  • The release version from On-Premise Releases, plus distribution access, organization name, and sizing profile from Arize AI.
Use dedicated namespaces for Arize AX. The examples use arize and arize-operator; if you choose different names, keep them dedicated to this Arize AX install. Do not install Arize AX into a namespace shared with other applications. Cleanup and reinstall commands can delete namespace resources and are not safe for shared namespaces.
Confirm which kubeconfig cluster entry the installer should use:
kubectl config get-clusters
kubectl config current-context
Set clusterName to the cluster entry from kubectl config get-clusters. Do not assume it is always the same as the current context name.

Decide whether this is an upgrade or a reset

If Arize AX is already installed, running ./arize.sh install again is the normal path for a refresh, a redeploy of operator-managed manifests, or an upgrade with your current values.yaml.
Use Fresh reinstall cleanup only when you intentionally want to discard the existing install and reinstall from an empty target.
If a previous install failed partway through, check the operator, jobs, and pods before deciding whether to continue with ./arize.sh install or reset the target. Only use the reset path when you intend to discard the existing Arize AX install; it deletes in-cluster Arize AX resources and data unless you have a backup/restore plan.

Choose the storage mode

Use cloud: ceph for Rancher installs with platform-provided S3-compatible object storage. The storage service does not need to be Ceph specifically; ceph is the generic S3-compatible path in the Arize AX distribution. Pre-provision both buckets in your platform’s storage and grant the install credentials read/write access. Pick bucket names that are unique to your install.

Small cluster values

Small or lab Rancher-managed clusters often need extra values beyond the minimal example. These are not universal production defaults, but they are common when the cluster has a shared node pool or less memory than the standard sizing profiles expect. Discuss these with Arize AI before using them in production:
# Use a shared node pool instead of a dedicated ArizeDB pool.
historicalNodePoolEnabled: false

# Use node emptyDir for component scratch space instead of large default PVCs.
ephemeralMode: emptyDir

# Disable autosizing on small clusters: with autosizing on, the ArizeDB
# historical container can request 24 GiB of memory, which will not fit on
# a typical homelab node and the pod will stay Pending.
autoSizeMemory: false
autoSizeReplicas: false
Some small clusters also need ArizeDB historical JVM/resource tuning, toleration changes, or baseOverlay patches. Treat those as environment-specific overrides, not copy/paste defaults. If your nodes are tainted, make the Arize AX tolerations match the taints your platform uses. The toleration values are strings, so keep the list quoted:
podTolerationAll: "[{key: 'workload', operator: 'Equal', value: 'arize', effect: 'NoSchedule'}]"
podTolerationBase: "[{key: 'workload', operator: 'Equal', value: 'arize', effect: 'NoSchedule'}]"
podTolerationHist: "[{key: 'workload', operator: 'Equal', value: 'arize', effect: 'NoSchedule'}]"
baseOverlay is a multiline YAML patch that the operator applies to Arize AX application manifests. Use it for targeted Arize AI-reviewed changes, such as changing a replica count or a container resource request. Paste it under baseOverlay: | exactly as provided:
baseOverlay: |
  ---
  apiVersion: apps/v1
  kind: StatefulSet
  metadata:
    name: druid-historical
    namespace: arize
  spec:
    replicas: 1
Avoid changing volumeClaimTemplates in baseOverlay for a StatefulSet that already exists. Kubernetes does not allow an existing StatefulSet’s volumeClaimTemplates to be mutated, so editing one, for example to resize a PVC, can cause the operator to stop reconciling with a Forbidden error. Setting volumeClaimTemplates before the first install is fine. To resize PVCs on a StatefulSet that already exists, work with Arize AI on the exact steps to follow.

Next steps