AI Research Papers

Dive into the latest technical papers with the Arize Community.
Sign up to join us for bi-weekly AI research paper readings.

Article

The Definitive Guide to LLM Evaluation

A structured approach to building, implementing, and optimizing evaluation strategies for LLM applications.

Read

Video course

DeepLearning course: Evaluating AI Agents

Learn how to systematically assess and improve your AI agent’s performance in Evaluating AI Agents, a DeepLearning course.

Watch

Podcast series

Deep Papers is a podcast series since 2023 featuring deep dives on today’s most important AI papers and research.

Listen

Trending Papers

Scouring X, Reddit, Hackernews and elsewhere to surface relevant generative AI research papers

AI Benchmark Deep Dive: Gemini 2.5 and Humanity’s Last Exam

AI Benchmark Deep Dive: Gemini 2.5 and Humanity’s Last Exam

A comprehensive overview of modern AI benchmarks, taking a close look at Google’s recent Gemini 2.5 release and its performance on key evaluations

Read full paper

LibreEval: A Smarter Way to Detect LLM Hallucinations

LibreEval: A Smarter Way to Detect LLM Hallucinations

The Arize team has generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.

Read full paper

AI Research Papers

Stay up to date with the latest breakthroughs in AI research.

Recommended resources

AI Agent Workflows and Architectures Masterclass

AI Agent Workflows and Architectures Masterclass

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

Build More Accurate AI Apps Through Fast Experimentation with Arize Phoenix, Langflow, and NVIDIA

Build More Accurate AI Apps Through Fast Experimentation with Arize Phoenix, Langflow, and NVIDIA

Start your AI observability journey.

Book a demo Get started