Alyx 2.0 - Cursor-like agent workflows

Learn more

Should I Use the Same LLM for My Eval as My Agent? Testing Self-Evaluation Bias

Published October 8, 2025