On Demand
Virtual
Session 1 (10/3): Benchmarking and Analyzing Retrieval Approaches
Session 2 (10/10): Statistical Analysis of Summarization LLM Evaluations
Session 3 (10/16): Statistical Analysis of Hallucination LLM Evaluations
Step into the world of LLM evaluations with a 3-part series dedicated to achieving production excellence. We’ll unpack advanced evaluation techniques and best practices formulated through rigorous testing — spanning retrieval, summarization, and hallucination — to help ensure production readiness. A must-attend for AI & ML engineers and data scientists. This series will cover:
- Binary LLM performance evaluation and its benefits
- Golden datasets and how to use them
- Statistical analysis of performance of GPT-4, GPT 3.5 and more
- Best practices for LLM evals