Views Navigation

Event Views Navigation

Today

Evaluating LLMs: Needle in a Haystack

San Francisco , United States

​LLM evaluation is a discipline where confusion reigns and foundation model builders are effectively grading their own homework. ​Building on the viral threads on X/Twitter,  Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models – from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude – are stacking up...