Tool calling, also called function calling, is the mechanism that lets a model or agent invoke external functions, APIs, databases, retrievers, code execution environments, or other systems. The model decides which tool to call and with what arguments.
Evaluation should separate tool selection from tool invocation. Choosing the right tool is one problem. Passing valid, safe, intent-aligned arguments is another. Both should be visible in traces and scored independently when they matter.