A scoring function is the logic that turns an input, output, context, trace, or trajectory into a score or label. It can be deterministic code, an LLM judge, an embedding similarity calculation, a rules engine, or a human annotation workflow.
Scoring functions should be versioned and evaluated like code. If the scorer changes, historical scores may not be comparable. For CI/CD and EvalOps, scorer versioning is as important as prompt or model versioning.