Rancher cluster and storage

Overview

Use this page when Kubernetes is already running and is managed by Rancher. Rancher can manage clusters across cloud, private cloud, and bare-metal infrastructure; this guide starts from an existing Rancher-managed Kubernetes cluster and does not cover provisioning Rancher or the underlying hosts. Your platform team owns the cluster-level choices: storage classes, ingress, DNS, registry access, and object storage. The Arize AX install flow is still the same: point kubectl at the target cluster, create values.yaml next to arize.sh, then run ./arize.sh install from the extracted distribution directory.

Before you start

You need:

A working kubeconfig for the target cluster.
Block-storage-backed storage classes for Arize AX persistent volumes. Do not use NFS-backed storage classes.
Object storage for Gazette and ArizeDB data. Use Ceph or another S3-compatible service managed by your platform team.
A storage class name for standard persistent volumes and SSD-style persistent volumes. They can be the same storage class if the cluster does not split storage tiers.
The application URL you plan to expose Arize AX at, for appBaseUrl. DNS does not need to resolve at install time; you can wire DNS and ingress afterwards. Pick the URL you intend to keep, though: OAuth callbacks, application redirects, and the configmap rendered by the operator all reference appBaseUrl, so changing it later means re-rendering the install configuration.
The release version from On-Premise Releases, plus distribution access, organization name, and sizing profile from Arize AI.

Use dedicated namespaces for Arize AX. The examples use arize and arize-operator; if you choose different names, keep them dedicated to this Arize AX install. Do not install Arize AX into a namespace shared with other applications. Cleanup and reinstall commands can delete namespace resources and are not safe for shared namespaces.

Confirm which kubeconfig cluster entry the installer should use:

kubectl config get-clusters
kubectl config current-context

Set clusterName to the cluster entry from kubectl config get-clusters. Do not assume it is always the same as the current context name.

Decide whether this is an upgrade or a reset

If Arize AX is already installed, running ./arize.sh install again is the normal path for a refresh, a redeploy of operator-managed manifests, or an upgrade with your current values.yaml.

Use Fresh reinstall cleanup only when you intentionally want to discard the existing install and reinstall from an empty target.

If a previous install failed partway through, check the operator, jobs, and pods before deciding whether to continue with ./arize.sh install or reset the target. Only use the reset path when you intend to discard the existing Arize AX install; it deletes in-cluster Arize AX resources and data unless you have a backup/restore plan.

Choose the storage mode

Use cloud: ceph for Rancher installs with platform-provided S3-compatible object storage. The storage service does not need to be Ceph specifically; ceph is the generic S3-compatible path in the Arize AX distribution. Pre-provision both buckets in your platform’s storage and grant the install credentials read/write access. Pick bucket names that are unique to your install.

Small cluster values

Small or lab Rancher-managed clusters often need extra values beyond the minimal example. These are not universal production defaults, but they are common when the cluster has a shared node pool or less memory than the standard sizing profiles expect. Discuss these with Arize AI before using them in production:

# Use a shared node pool instead of a dedicated ArizeDB pool.
historicalNodePoolEnabled: false

# Use node emptyDir for component scratch space instead of large default PVCs.
ephemeralMode: emptyDir

# Disable autosizing on small clusters: with autosizing on, the ArizeDB
# historical container can request 24 GiB of memory, which will not fit on
# a typical homelab node and the pod will stay Pending.
autoSizeMemory: false
autoSizeReplicas: false

Some small clusters also need ArizeDB historical JVM/resource tuning, toleration changes, or baseOverlay patches. Treat those as environment-specific overrides, not copy/paste defaults. If your nodes are tainted, make the Arize AX tolerations match the taints your platform uses. The toleration values are strings, so keep the list quoted:

podTolerationAll: "[{key: 'workload', operator: 'Equal', value: 'arize', effect: 'NoSchedule'}]"
podTolerationBase: "[{key: 'workload', operator: 'Equal', value: 'arize', effect: 'NoSchedule'}]"
podTolerationHist: "[{key: 'workload', operator: 'Equal', value: 'arize', effect: 'NoSchedule'}]"

baseOverlay is a multiline YAML patch that the operator applies to Arize AX application manifests. Use it for targeted Arize AI-reviewed changes, such as changing a replica count or a container resource request. Paste it under baseOverlay: | exactly as provided:

baseOverlay: |
  ---
  apiVersion: apps/v1
  kind: StatefulSet
  metadata:
    name: druid-historical
    namespace: arize
  spec:
    replicas: 1

Avoid changing volumeClaimTemplates in baseOverlay for a StatefulSet that already exists. Kubernetes does not allow an existing StatefulSet’s volumeClaimTemplates to be mutated, so editing one, for example to resize a PVC, can cause the operator to stop reconciling with a Forbidden error. Setting volumeClaimTemplates before the first install is fine. To resize PVCs on a StatefulSet that already exists, work with Arize AI on the exact steps to follow.

​Overview

​Before you start

​Decide whether this is an upgrade or a reset

​Choose the storage mode

​Small cluster values

​Next steps

Overview

Before you start

Decide whether this is an upgrade or a reset

Choose the storage mode

Small cluster values

Next steps