Evals as APIs means exposing evaluation results and workflows through programmable interfaces rather than only through dashboards or reports. Agents, CI systems, deployment tools, notebooks, and internal platforms should be able to run evals, fetch scores, inspect explanations, and compare experiments.
This is the infrastructure version of evaluation. A developer should be able to call an eval from a pull request, a release gate, a monitor, or an agent workflow. The API should return structured results that can drive actions: pass, fail, alert, annotate, rerun, rollback, or request review.