Checkpointing is the practice of saving an agent's state at defined points so the workflow can resume, audit, retry, or roll back after interruption. A checkpoint might store plan state, completed tool calls, retrieved documents, intermediate outputs, and policy approvals.
Checkpointing is a reliability primitive for long-running agents. It helps prevent duplicate side effects, makes debugging easier, and creates natural points for human review or evaluation.