AI Research Papers
Dive into the latest technical papers with the Arize Community.
Sign up to join us for bi-weekly AI research paper readings.
Article

The Definitive Guide to LLM Evaluation
A structured approach to building, implementing, and optimizing evaluation strategies for LLM applications.
Read
Video course

Deeplearning course: Evaluating AI Agents
Learn how to systematically assess and improve your AI agent’s performance in Evaluating AI Agents, a DeepLearning course.
Watch
Podcast series


Deep Papers is a podcast series since 2023 featuring deep dives on today’s most important AI papers and research.
ListenFeatured
Research posts

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

John Gilhuly
Video
29:59

Agent-as-a-Judge: Evaluate Agents with Agents

John Gilhuly
Video
27:30

Introduction to OpenAI’s Realtime API

Sally-Ann DeLucia

Aparna Dhinakaran
Video
29:54

Model Context Protocol

Sally-Ann DeLucia
Video
15:31

How DeepSeek is Pushing the Boundaries of AI Development

Sally-Ann DeLucia
Video
29:00

Multiagent Finetuning: A Conversation with Researcher Yilun Du

Sally-Ann DeLucia
Video
29:56

Swarm: OpenAI’s Experimental Approach to Multi-Agent Systems

Xander Song

John Gilhuly
Video
46:38

Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

Dat Ngo
Video
29:56

Composable Interventions for Language Models

Sally-Ann DeLucia
Video
45:34

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Sally-Ann DeLucia
Video
41:04

Extending the Context Window of LLaMA Models Paper Reading

Jason Lopatecki
Video
47:29

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

Dat Ngo

Sally-Ann DeLucia
Video
35:35

RAFT: Adapting Language Model to Domain Specific RAG

Sally-Ann DeLucia
Video
44:22

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

Dat Ngo
Video
45:55

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

Sally-Ann DeLucia

Amber Roberts
Video
49:14

Breaking Down EvalGen: Who Validates the Validators?

Aparna Dhinakaran

Sally-Ann DeLucia
Video
44:31

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

Sally-Ann DeLucia

Aman Khan
Video
43:15

Demystifying Amazon’s Chronos: Learning the Language of Time Series

Sally-Ann DeLucia

Amber Roberts
Video
44:34

Anthropic Claude 3

Sally-Ann DeLucia

Aman Khan
Video
42:57

Reinforcement Learning in the Era of LLMs

Claire Longo
Video
44:51

Sora: OpenAI’s Text-to-Video Generation Model

Dat Ngo
Video
45:16

Phi-2 Model

Aman Khan

Sally-Ann DeLucia
Video
44:35

Mistral AI (Mixtral-8x7B): Performance, Benchmarks

Dat Ngo

Aparna Dhinakaran

Aman Khan
Video
47:56

How to Prompt LLMs for Text-to-SQL

Amber Roberts
Video
45:01

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

Sally-Ann DeLucia
Video
40:34

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Jason Lopatecki
Video
43:40

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

Claire Longo

Amber Roberts
Video
42:53

Explaining Grokking Through Circuit Efficiency

Sally-Ann DeLucia

Jason Lopatecki
Video
39:19

Large Content And Behavior Models to Understand, Simulate, and Optimize Content and Behavior.

Sally-Ann DeLucia

Amber Roberts
Video
42:04

Skeleton of Thought: LLMs Can Do Parallel Decoding Paper Reading

Aparna Dhinakaran
Video
44:25

Extending the Context Window of LLaMA Models Paper Reading

Jason Lopatecki
Video
43:32

Llama 2: Open Foundation and Fine-Tuned Chat Models Paper Reading

Aparna Dhinakaran
Video
31:18

Lost in the Middle: How Language Models Use Long Contexts Paper Reading

Sally-Ann DeLucia

Amber Roberts
Video
43:18

Orca: Progressive Learning from Complex Explanation Traces of GPT-4 Paper Reading

Jason Lopatecki

Richard Young
Video
45:10

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

Dat Ngo

Sally-Ann DeLucia
Video
37:53

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels
AM
Adam May

Aman Khan

Jason Lopatecki
Video
39:28

Voyager: An Open-Ended Embodied Agent with LLMs Paper Reading and Discussion

Aparna Dhinakaran

Jason Lopatecki
Video
46:51

Retrieval-Augmented Generation – Paper Reading and Discussion

Aman Khan
Video
44:47

LoRA: Low-Rank Adaptation of Large Language Models Paper Reading and Discussion

Aparna Dhinakaran
Video
40:18

Drag Your GAN: Interactive Point-Based Manipulation on the Generative Image Manifold

Aparna Dhinakaran

Jason Lopatecki
Video
37:51

LIMA: Less Is More for Alignment – Paper Reading and Discussion

Jason Lopatecki

Aparna Dhinakaran
Video
36:48

Hungry Hungry Hippos (H3) and Language Modeling with State Space Models

Aparna Dhinakaran

Jason Lopatecki
Video
41:54

Toolformer: Training LLMs To Use Tools

Aparna Dhinakaran

Jason Lopatecki
Video
34:07

OpenAI on Reinforcement Learning With Human Feedback (RLHF)

Jason Lopatecki
Video
47:40