Evaluation as infrastructure means treating evaluation as a core system dependency, similar to logging, observability, testing, and CI/CD. It is not a periodic review or a spreadsheet of examples. It is a repeatable layer that development, production, agents, and humans can rely on.
As infrastructure, evaluation needs APIs, datasets, versioning, runners, storage, permissions, monitors, and actions. The value is not the score by itself. The value is the workflow the score triggers.