How to evaluate AI agents, avoid reward hacking, and build better specs

Published July 2, 2026