Continuous evaluation means running evals as an always-on part of development and production. Offline evals run against curated datasets before release. Online evals score production traces or sessions as traffic flows through the system.
The value is early detection. Continuous evaluation can catch prompt drift, model regressions, retrieval failures, safety issues, or changing user behavior before they become invisible product debt.