Canary evaluation is the process of routing a small amount of traffic or a limited set of users through a new AI system version and evaluating the result before full rollout. It is useful when offline evals are not enough to predict production behavior.
Developers should instrument canaries with trace comparison, online evals, latency and cost metrics, safety checks, and rollback criteria. A canary without evaluation is just a risky rollout with a smaller blast radius.