-
-
-
Troubleshooting Large Language Models – an Interview with Amber Roberts
Virtual , United States -
Community Paper Reading: Phi-2 – Small Language Models (SLMs) vs. LLMs
Virtual , United States -
RAG Time! Evaluate RAG with LLM Evals and Benchmarking
Virtual , United States -
-
Community Paper Reading: RAG vs Fine-tuning
Virtual , United States -
-
Evaluating LLMs: Needle in a Haystack
San Francisco , United StatesLLM evaluation is a discipline where confusion reigns and foundation model builders are effectively grading their own homework. Building on the viral threads on X/Twitter, Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models – from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude – are stacking up...
-
Path to Production: LLM System Evaluations and Observability
San Francisco , United States -
Community Paper Reading: Exploring Sora & Evaluating Large Video Generation Models
Virtual , United States -
DevOps for GenAI Hackathon SF by MongoDB
San Francisco , United States -
-
Community Paper Reading: Reinforcement Learning in the Era of LLMs
Virtual , United States -
Community Paper Reading: Claude-3
Virtual , United States