Community Papers Reading

  Live | Every Other Wednesday

  10:15am PT | 45 minutes

Join us every other Wednesday for an engaging discussion session where we delve into the latest technical papers, covering a range of topics including large language models (LLM), generative models, ChatGPT, and more. This recurring event offers an opportunity to collectively analyze and exchange insights on cutting-edge research in these areas and their broader implications.

On Demand | The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

We’re excited to be joined by Samuel Marks, Postdoctoral Research Associate at Northeastern University, to discuss his paper, “The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets”. Samuel and his team curated high-quality datasets of true/false statements and used them to study in detail the structure of LLM representations of truth. Overall, they present evidence that language models linearly represent the truth or falsehood of factual statements and also introduce a novel technique, mass-mean probing, which generalizes better and is more causally implicated in model outputs than other probing techniques.

Link to paper: https://arxiv.org/abs/2310.06824

Recording: https://youtu.be/7XNqsFA0Znw

On-Demand | Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

This week, we’re discussing “Decomposing Language Models Into Understandable Components”, which addresses the challenge of understanding the inner workings of neural networks, drawing parallels with the complexity of human brain function. It explores the concept of “features,” (patterns of neuron activations) providing a more interpretable way to dissect neural networks. By decomposing a layer of neurons into thousands of features, this approach uncovers hidden model properties that are not evident when examining individual neurons. These features are demonstrated to be more interpretable and consistent, offering the potential to steer model behavior and improve AI safety.

Link to paper: https://transformer-circuits.pub/2023/monosemantic-features/index.html

Recording: https://www.youtube.com/watch?v=hlCxSqWS6Rw

On Demand | RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

In this paper reading, we’ll be discussing RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. While researchers have successfully applied LLMs such as ChatGPT to reranking in an information retrieval context, such work has mostly been built on proprietary models hidden behind opaque API endpoints.This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations. RankVicuna provides access to a fully open-source LLM and associated code infrastructure capable of performing high-quality reranking

Link to paper: https://arxiv.org/abs/2309.15088v1

Recording: https://youtu.be/fAVHx89aRHU

On Demand | Explaining Grokking Through Circuit Efficiency

Join Arize Co-Founder & CEO Jason Lopatecki, and ML Solutions Engineer, SallyAnn DeLucia, as they discuss “Explaining Grokking Through Circuit Efficiency”. This paper explores novel predictions about grokking, providing significant evidence in favour of its explanation. Most strikingly, the research conducted in this paper demonstrates two novel and surprising behaviors: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalisation to partial rather than perfect test accuracy.

Link to paper: https://arxiv.org/abs/2309.02390

Recording: https://youtu.be/n-hkcgd7SBw

On Demand | Large Content And Behavior Models

Join Arize’s Amber Roberts and SallyAnn DeLucia as they discuss “Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior”. This paper highlights that while LLMs have great generalization capabilities, they struggle to effectively predict and optimize communication to get the desired receiver behavior. We’ll explore whether this might be because of a lack of “behavior tokens” in LLM training corpora and how Large Content Behavior Models (LCBMs) might help to solve this issue.

Link to paper: https://arxiv.org/abs/2309.00359

Recording: https://www.youtube.com/watch?v=KY76SCEjEIo

On Demand | Skeleton of Thought: LLMs Can Do Parallel Decoding

Join us for an exploration of the ‘Skeleton-of-Thought’ (SoT) approach, aimed at reducing large language model latency while enhancing answer quality, with the presence of two authors, Xuefei Ning and Zinan Lin. SoT’s innovative methodology guides LLMs to construct answer skeletons before parallel content elaboration, achieving impressive speed-ups of up to 2.39x across 11 models. Don’t miss the opportunity to delve into this human-inspired optimization strategy and its profound implications for efficient and high-quality language generation.

Link to paper: https://arxiv.org/abs/2307.15337


On-Demand | Extending the Context Window of LLaMA Models

During this week’s paper reading event, we are thrilled to announce that we will be joined by Frank Liu of Zilliz, who will be sharing valuable insights with us. This paper examines Position Interpolation (PI), a method extending context window sizes of LLaMA models up to 32,768 positions with minimal fine-tuning. The extended models showed strong results on tasks requiring long context and retained their quality within the original context window. PI avoids catastrophic attention score issues by linearly down-scaling input position indices. The method’s stability was demonstrated, and existing optimization and infrastructure could be reused in the extended models. Additionally, during the event, we will also discuss the write-up “Extending Context is Hard… But Not Impossible” available at https://kaiokendev.github.io/context.

Link to Paper: https://arxiv.org/pdf/2306.15595.pdf


On Demand | Llama 2

In this paper reading, we explore the paper “Open Foundation and Fine-Tuned Chat Models.” The paper introduces Llama 2, a collection of pretrained and fine-tuned large language models ranging from 7 billion to 70 billion parameters. Their fine-tuned model, Llama 2-Chat, is specifically designed for dialogue use cases and showcases superior performance on various benchmarks. Through human evaluations for helpfulness and safety, Llama 2-Chat emerges as a promising alternative to closed-source models. Discover the approach to fine-tuning and safety improvements, allowing us to foster responsible development and contribute to this rapidly evolving field.

Link to Paper: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/

Recording: https://www.youtube.com/watch?v=HyppoCyOwfY

On-Demand | Lost in the Middle

This paper examines how well language models utilize longer input contexts. The study focuses on multi-document question answering and key-value retrieval tasks. The researchers find that performance is highest when relevant information is at the beginning or end of the context. Accessing information in the middle of long contexts leads to significant performance degradation. Even explicitly long-context models experience decreased performance as the context length increases. The analysis enhances our understanding and offers new evaluation protocols for future long-context models.

Link to paper: https://arxiv.org/abs/2307.03172

Link to recording:

On-Demand | Orca

Recent research focuses on improving smaller models through imitation learning using outputs from large foundation models (LFMs). Challenges include limited imitation signals, homogeneous training data, and a lack of rigorous evaluation, leading to overestimation of small model capabilities. To address this, we introduce Orca, a 13-billion parameter model that learns to imitate LFMs’ reasoning process. Orca leverages rich signals from GPT-4, surpassing state-of-the-art models by over 100% in complex zero-shot reasoning benchmarks. It also shows competitive performance in professional and academic exams without CoT. Learning from step-by-step explanations, generated by humans or advanced AI models, enhances model capabilities and skills.

Link to Paper: https://arxiv.org/abs/2306.02707

Link to Recording: https://www.youtube.com/watch?v=BswvaWZdWw4

On-Demand | Generalized LoRA (GLoRA)

Introducing GLoRA: a universal, parameter-efficient fine-tuning approach for diverse tasks. GLoRA enhances LoRA with a generalized prompt module, optimizing pre-trained model weights and activations. Its scalable, layer-wise structure search enables efficient parameter adaptation. GLoRA excels in transfer learning, few-shot learning, and domain generalization, outperforming previous methods on various datasets. With fewer parameters and no extra inference cost, GLoRA is a practical solution for resource-limited applications. Join us to explore GLoRA’s capabilities in this interactive community paper reading!

Link to Paper: https://arxiv.org/abs/2306.07967

Recording: https://www.youtube.com/watch?v=GCh2HWOKiaU&t=5s

On-Demand | HyDE

Explore HyDE, a thrilling zero-shot learning technique that combines GPT-3’s language understanding with contrastive text encoders. HyDE revolutionizes information retrieval and grounding in real-world data by generating hypothetical documents from queries and retrieving similar real-world documents. It outperforms traditional unsupervised retrievers, rivaling fine-tuned retrievers across diverse tasks and languages.

This leap in zero-shot learning efficiently retrieves relevant real-world information without task-specific fine-tuning, broadening AI model applicability and effectiveness. Join us for a paper reading on how HyDE works!

Link to Paper: https://arxiv.org/abs/2212.10496

Recording: https://youtu.be/PvT8ntmm1Xs

On-Demand | VOYAGER

VOYAGER, the first LLM-powered embodied lifelong learning agent in Minecraft, autonomously explores the world, acquires skills, and makes discoveries without human intervention. It outperforms previous approaches, achieving exceptional proficiency in playing Minecraft and successfully applies its learned skills to solve novel tasks in different Minecraft worlds, surpassing techniques that struggle with generalization.

Link to Paper: https://arxiv.org/pdf/2305.16291.pdf

Link to Recording: https://www.youtube.com/watch?v=BU3w_AbCEbA

On-Demand | Retrieval-Augmented Generation (RAG)

This week we’re diving into the world of Retrieval-Augmented Generation (RAG)!

We know GPT-like LLMs are great at soaking up knowledge during pre-training and fine-tuning them can lead to some pretty great, specific results. But when it comes to tasks that really demand heavy knowledge lifting, they still fall short. Plus, it’s not exactly easy to figure out where their answers come from or how to update their knowledge.

Enter RAG models, a hybrid beast that combines the best of both worlds: the learning power of pre-trained models (the parametric part), and an explicit, non-parametric memory — imagine a searchable index of all of Wikipedia.

Link to paper: https://arxiv.org/abs/2005.11401

On-Demand | LIMA: Less Is More for Alignment
On-Demand | Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

This paper introduces a novel approach, DragGAN, for achieving precise control over the pose, shape, expression, and layout of objects generated by GANs. It allows users to “drag” any points of an image to specific target points — in other words, it enables the deformation of images with better control over where pixels end up to produce ultra-realistic outputs. Paper: https://arxiv.org/abs/2305.10973

View Recording: https://youtu.be/DxzsgV8rTOw

Register for the Series


Aparna Dhinakaran
Co-founder & Chief Product Officer

Aparna Dhinakaran is the Co-Founder and Chief Product Officer at Arize AI, a pioneer and early leader in machine learning (ML) observability. A frequent speaker at top conferences and thought leader in the space, Dhinakaran was recently named to the Forbes 30 Under 30. Before Arize, Dhinakaran was an ML engineer and leader at Uber, Apple, and TubeMogul (acquired by Adobe). During her time at Uber, she built several core ML Infrastructure platforms, including Michealangelo. She has a bachelor’s from Berkeley's Electrical Engineering and Computer Science program, where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision Ph.D. program at Cornell University.

Get ML observability in minutes.

Get Started