Durable execution is the ability for an agent workflow to survive time, retries, failures, restarts, and long-running operations without losing state. It is especially important for workflows that wait on external systems, human approval, scheduled jobs, or multi-step tasks.
For agents, durable execution usually requires persisted state, idempotent tool calls, checkpoints, retry policies, and trace continuity. Otherwise a crash or timeout can leave the system in an unknown state.