Alyx 2.0: How we built an AI engineering agent

Register

Should I Use the Same LLM for My Eval as My Agent? Testing Self-Evaluation Bias

Published October 8, 2025