Community papers - resources image

Voyager: An Open-Ended Embodied Agent with LLMs Paper Reading and Discussion

Published Jun 19, 2023

Sarah Welsh

Contributor

Introduction

In this paper reading, we discuss Voyager: the first LLM-powered embodied lifelong learning agent in Minecraft. Voyager autonomously explores the world, acquires skills, and makes discoveries without human intervention. It outperforms previous approaches, achieving exceptional proficiency in Minecraft, and successfully applies its learned skills to solve novel tasks in different Minecraft worlds, surpassing techniques that struggle with generalization.

Join us every Wednesday as we discuss the latest technical papers, covering a range of topics including large language models (LLM), generative models, ChatGPT, and more. This recurring event offers an opportunity to collectively analyze and exchange insights on cutting-edge research in these areas and their broader implications.

Watch

Dive in:

Transcript

Jason Lopatecki, Co-Founder and CEO, Arize AI: We’ll give it one more minute here. Awesome. So why don’t we hop in? So Voyager is essentially an RL Agent that has to work its way through Minecraft and Minecraft being this world that’s fairly open-ended. I actually haven’t Minecraft much, have you?

Michael Schiff, Chief Technology Officer, Arize AI: This is where the three of us admit that. I’ve gone deep down the rabbit hole watching people build Redstone computers.

Aparna Dhinakaran, Co-Founder and Chief Product Officer, Arize AI: I think being a gamer is an advantage if you’re in the LLM world.

Jason Lopatecki: Either way, this game seems to pop up a lot. I think it’s because it’s so open, and it’s not like you know, level by level, the traditional games we grew up with–you build the world. And it becomes challenging when things are open-ended, because you can do so many different things. You can explore a lot of things you can build and apparently as part of this game, one of the hardest things to do is, I guess, achieve the diamond which this did do and but kind of in it. I think this was done in a pretty amazing way. You are essentially using GPT-4 to come up with a curriculum that explores that. That gives an exploration of the space instead of actions and things that you go. Based on completing those things that should go, do they build skills? So a skill library of executable code, which I actually think is one of the more interesting things here. Later on, they’ll attach it to Auto-GPT–the same skill library–which is a different kind of RL approach. It even improves the Auto-GPT so the skill library is one of the most interesting things here. I think we’ll spend some time on that. It probably also ties into OpenAI’s latest release of information. There’s basically an iterative approach to the prompting which incorporates feedback. So it builds these skills based upon execution and and and improve them based upon feedback from the environment.

Anything else to add as we go in?

Michael Schiff: That’s a pretty good summary of the technique. I think the inflation studies that they go into at the end which you refer to a moment ago are pretty interesting. But yeah, let’s get into it.

Aparna Dhinakaran: Let’s get into it. Because there’s also like: what are the impacts, what can you do taking the techniques from this paper? What does it extract to code generation? That’s really interesting and I feel like all of Twitter’s been talking about it.

Jason Lopatecki: I think that’s exactly the lesson here, and the lifelong learning thing is probably the biggest tie into the code. Where we all are used to building skills and talents, and hopefully, they’re reusable. Hopefully, I can use to build on the talents I have to do other things. But LLMs don’t have that. And I think what’s also interesting is in the traditional model approach to build skills. It’s a gradient-based approach like you’re training something to build some skill in some area where none of this is training at all. And this is fine tuning. It’s all prompts and uses of GPT- And to get these long term skills you use cut-wood. It’s a pretty amazing approach to building the skill sets. There’s an argument here for lifelong learning, or we’re building these skills or coming up with those tasks like, what skills do I need? Refining those skills based upon feedback? So I you know, I go try to attempt something, and either I’m successful or not at it, or with some grading of it. And then exploring to figure out what new tasks do. I need to go do like? Do I run into a river or mountain, or like what is it that I actually need to do and figure out new tasks to go build on top of the skills I have.

I think it’s a pretty good argument for a lifelong learning approach. I’m sure over time, fast forward or years will get more complicated in this area, but like seems like a pretty good breakdown of a simple approach to gaining and building the kind of skills that we have. Anything else you want to add?

Aparna Dhinakaran: Could we jump into maybe the first, like each part of the method. So there’s kind of the curriculum, you know. Generate the automatic curriculum component. There’s the building, a school library, and then there’s kind of that iterative prompting mechanism. the the part that was. Maybe we can jump in each one of these components. So the part that I was really trying to take away from the autumn the, you know, the skill library made sense. Okay, once you have a skill that you’ve learned, you can add it. You can do a search on it later on. Where to use it. The automatic curriculum. This one was like, how do you generate that? Maybe we could dive into a little bit of that like, how do you know?

Michael Schiff: They actually discuss the random curriculum and a manual curriculum versus the automatic curriculum. I kind of came away from the paper thinking about curriculum like coaching and it was really interesting in the revelation settings. Later, when you remove an automatic curriculum, basically, the agents get stuck without really knowing what to do next. So just this process of prompting and directed prompting based on the outputs of previous tabs. So did you succeed. And the progressive overload of your agent in order to make it stronger and actually gain the skills.

Aparna Dhinakaran: Do you think it was just through the prompt engineering, you know, it knows the agent’s state, it knows what it’s completed and what it’s filled out before, like how you…

Michael Schiff: The other day, just while I was reading the paper, I was curious to see what amount of Minecraft information GPT had encoded in its parameters. So I asked it a weak version of their initial prompt. So I asked what’s the best strategy for playing Minecraft if I were to discover as many diverse things as possible? And it has a solid understanding of the game, the way it works, specificities of the environments that you might encounter different creatures that you would find there the different material types again. I don’t play it so it could lie to me. But it appears to know quite a bit–to quote the RAG paper–in its parametric memory even without these learned non-parametric skills. And so I think just based on what it’s learned from the internet, it knows enough about Minecraft that if you say like, I’m in the forest with these materials, what should I do next? You’ll get a semi meaningful response, and if you compose a more sophisticated prompt, it makes sense that you would get an even more detailed suggestion of what to do next.

Jason Lopatecki: Yeah, I think that’s a good point, when I was reading this section I was thinking, basically it has kind of a world view of Minecraft like it’s got an idea of that entire Minecraft world. It’s got an idea of maybe what it should be doing, or what the tasks could be to do your next thing, so it’s trained on enough data to have a view of the world. to then even espouse an idea of tasks. And I was kind of thinking that myself but I think, Michael, the little test you did kind of proves it a bit.

Michael Schiff: It’s like gathering resources. They pretty generally “gather resources, collect a wide range of resources during that exploration.” So I was curious. I imagine it’s the additional context that gets you specific answers.

Jason Lopatecki: I guess one question too is when you think about generalization of this to other other things like, would it work in a place or a game where it didn’t have the same extensive world view for example, how much does it generate the tasks to go do if it doesn’t know totally about this this game or environment?

Michael Schiff: I think that’s an interesting question, and it kind of leads into the skill library, and I’m still primed from last week’s RAG paper reading. But the extent to which this feels like many search and retrieval problems embedded or composed together like you could take the skill library, and its prompt or query and passage into the context of the next prompt as a mini search and retrieval problem, and to get to your previous question. If I wrote a game today that GPT-4 didn’t know anything about, could I not write extensive documentation for this game and build a search and retrieval engine which functioned as my automatic curriculum. And so I see this search and retrieval like a module that you can plug that technique in, and I kind of think that’s what they’ve done with the skill library.

Jason Lopatecki: Yeah, this is probably the most interesting thing about the whole paper in my mind. And it wasn’t even the search and retrieval approach to me, it was like: Building knowledge is about building code that’s executable, and building a hierarchy of code that you can go execute is this kind of your, or maybe it’s skills or abilities. But maybe it’s not knowledge of skills and abilities. It’s the ability to kind of use data to take actions or do things, but that is quite powerful. And I saw there was a twitter thread going on of LMS core that I posted internally and it hits inside a couple of things there like it hints at the code outside of everything. And I think, Michael, you had a comment on that.

Michael Schiff: I think absolutely they are useful for defining a continuum, you know, but it’s rare that something is all one or the other. So this idea that you’re gonna have, you know you write the outer shell that interacts with it, or the entire outer shell produces executable code. It seems to me that any real application is going to be a combination of the but using that as a framework for thinking about this paper, it’s quite interesting that the code that the the authors wrote is scaffolded. Nowhere are they writing code that encodes the logic of playing Minecraft well. The code that they’ve written is at best like it knows about coaching, maybe, and about how skills are built. It understands how to prompt and what to do with the outputs of that prompt, how to take that and execute executable outputs and basically create this feedback loop. And all of the code which knows how to play Minecraft was automatically generated through, sort of like a semi reinforcement of a project. We haven’t got it to yet, but that the iterative prompting is kind of interesting like sort of reminds me of evolutionary algorithms where you just kind of make something and see how it does.

Jason Lopatecki: Yeah. So this iterative prompting…We’ll go into the feedback loop, probably in a second too. There’s like the whole critique of what you did there. There’s like components of that, I think in this case, that you know, I do think like still harping on like code, as saving skills and actions as a fascinating paradigm. And yesterday OpenAI released an ability to run functions from the API calls which you can imagine you could build skills within your client code base of functions to call the skills that you have, and when the LLM responds, you run those skills that you have in your code base. You run so like the whole parent this whole paradigm like you notice by OpenAI is already releasing stuff to support it so you could run your own. You could build your own skill, library of functions on your side that you would execute and start to store those.

Aparna Dhinakaran: Before we jump into that, just for the folks who are following the train of the paper. So you have this curriculum that you’re building by asking to first teach you these diverse skills and then it generates code, and then you save off. So, the skill library what it’s doing is it’s taking basically in this case it, you have craft and iron pick acts in Minecraft. And so it has this code to basically tell you, I guess response here, which is like to craft an iron pickaxe and it tells you what you need to go, do, and you can. Doesn’t have to be on a pickaxe. It could be. How do I go? I think there’s a bunch of them like, how do I go hunt, or whatever tells you how to go? Do that based on your current state and generate code for it, saves that off if it needs to be code, even and then saves that off in your skill library. So now it’s kind of like going to school: you’re learning. I’ve learned today my skill for reading. Maybe when you’re younger and then, like later, you have skills like your drawing you can build skills. So in this case they’re getting more and more specific. Right? There’s like chopping wood and making a crafting table. And so you start building these skills. And then later when you’re actually pulled into another task, you can look at your skill library and use those skills to kind of build on top of that, and then add an even more complex skill into your skill library, which is, you know, maybe you start off by learning how to chop wood, and then you can take the wood and then learn to like make a table from it. And that’s kind of what I feel like the skill library concept here was trying to get at, and that was really interesting. It is really freaking cool actually.

Jason Lopatecki: Yeah, I think it’s a paradigm we’re going to see a lot in the LLM space. I mean that it took me a minute to figure out what the value was, I guess, but I think it’s clear there’s going to be a whole whole set of things built on top of this especially with the latest OpenAI release

Michael Schiff: Similarly to the way we’re seeing, you know you bring your data to an LLM in the form, and I know I keep coming up to search and retrieval but it’s just because I see it’s such a valuable primitive. But the scale library, I think, is identical to this chat on your docs. But it’s almost like you chat on your logic like you. Can you provide a standard library for interaction in this case GPT learning, but they can be functioned calls from anywhere. And you know GPT only wrote, some of it–it is calling standard libraries that were probably written like humans. But it creates an interesting interface between the code that you want control over and the control that you want the language model to have. I think we’re gonna see a lot of really interesting uses of this. I think you’ve been outside of the game playing. But I think, probably particularly outside game playing.

Aparna Dhinakaran: After reading this, some of the questions I had around the school library was well, first this, at least in the example above Jason, the skills are pretty dang specific, right? So if I had to take this to the thing that I could think about most applicable was code generation, right? Imagine if you could write code first and teach it how to write a simple application. And then the build skills on top, like, connect your database, etc. And you could build on top. So the first thing I was like, how does it know how specific does it have to be to call it a skill versus how general can a skill be? It’s very hard thing to quantify, I guess. But when is it specific enough that it could be hold and use in different places.

Jason Lopatecki: One idea I came to, and this is probably true of everything not just APIs, but I think you’re getting at like, how complicated of a program could it create and how? And where is it going to stumble on the building blocks of what it could create. Where would you give it a task, and it couldn’t do it? One thought I had is just, you know, anything that’s well known APIs, that’s it’s an original building block that it might know is quite powerful because it’s part of the model itself. There’s some knowledge base around that. So if you’re starting, and I think that’s kind of the case. You’re maybe starting with the stuff that kind of knows or knows in some way based upon its world worldview or world knowledge. There’s probably a complex of a program to create where, like simple tree hierarchies like this is right here, probably slightly easier to build in a good area, to focus. you know you build, you build the iron you can build up. You can use the sword. Without you to build, you can use iron to build a swords, simple hierarchy, and very clear goals and numeric recipes.

Aparna Dhinakaran: I think the question is like if I had to take it back to code generation, what’s the skill set? What’s big enough? How does it know what it can use together, are all really interesting questions.

Michael Schiff: I don’t know that there’s a threshold. I think it becomes an interplay between the curriculum that’s generating the skills. Or that’s generating the tasks, the audit that’s generating the skills and the piece that we haven’t really talked about, which is the self feedback mechanism. So verification where they determine if they did the task well enough, which I thought was particularly interesting in that it builds in this notion that it’s not a binary. Sometimes it comes explicitly out of the environment like your program might crash, and then you know you did not do it well enough. But just like with a human skill, for instance if you were practicing doing a backflip, you might do a backflip, but you stumble on the landing and like, did you do it well enough? You might feel like, yes, but a professional gymnast might be like, no. So it’s not just enough to analyze the code that gets generated, but if you don’t have an ability to verify that, then you have surpassed the degree of complexity for which this is going to work.

Jason Lopatecki: The iteration here is interesting. When you have GPT-4 creating code for you, don’t do it zero shot or if you’re doing zero shot, then have some feedback, and some ability to iterate on it. Don’t just take the one thing it comes out, you probably have to run it, get some feedback, iterate on that based upon that feedback. And it does a pretty good job at that. I think this is just another clear example. It even knew it couldn’t use or create that thing it was trying to create. So this is kind of the iterative prompting mechanism that we’re just talking about there with some kind of self verification of checking test success which Michael, you’re talking about on the critique side…. And these are some examples of what it looks like.

Aparna Dhinakaran: If you go up just a little bit, Jason. I thought this part was cool. So they actually said, Okay, there’s three types of feedback. That self verification is all. Obviously we into that? There’s kind of like this environment feedback like, I can’t use this skill in this specific scenario because of whatever reason. So maybe I’m trying to make a fire, but it’s too windy, right? So it’s kind of like environmental feedback. And then there’s execution errors where it’s a little bit more about the tactical code, bug related issues. And then this one, which is more around self verification for checking the task’s success. So it’s then at each new task instantiating another GPT-4 agent checking if the task was done or not and the reasoning on the task.

Jason Lopatecki: I was looking at this in the code base, which we’ll kind of show in a second but in order to critique it you kind of need to know the world. So there is a key point here. In order for this feedback loop to actually work, you still need the LLM to have some world view, and to give you this critique and feedback, because this doesn’t come from anywhere else but the LLM itself.

Michael Schiff: Except for the minor amount of additional context that’s provided by the my player API.

Jason Lopatecki: That’s true.That’s an additional piece.

Michael Schiff: It’s an interesting balance. I mean, I think it comes back to your original question earlier about how it’s not built into the world.

Jason Lopatecki: I thought I’d do one thing real quick to just get an idea of what these look like. So the prompts are in here within the voyager code base. This is an example of the skill when you’re asking to write a function, what does it look like? You know a pretty good boilerplate here for anyone doing stuff to probably write functions you know not a ton outside of just help me write a function here. So I thought that was pretty pretty basic.The curriculum is much more detailed. This is kind of giving you the setup to try to get good tasks out.

Michael Schiff: Between that and the amount of Minecraft world information that seems clearly there from just me poking around it makes a lot of sense.

Jason Lopatecki: So you do get this information. You are getting some feedback from the game and it gives you an example of a JSON response. These are useful to kind of go through and get an idea. I mean, if you’re going to build your own, or doing anything around trying to stuff out. And then they obviously have the agents in here, too, I thought. I think they want to know. I did see it here, is there? There were some edge cases where they did have to. You know, it’s not completely generic. There were some edge cases where maybe it wasn’t this one curriculum where they did have to encode some of the Minecraft world into the code base itself? Not surprising. But the I just kind of notice, you know, slightly little bits of kind of initial state or state , and in this agent the curriculum agent to deal with kind of this the kind of cold start stuff

Michael Schiff: I wonder if some of that was to get around the limitations that they brought up. For instance: “GPT–4 tends to use cobblestone as a fuel input despite being an invalid fuel source in the game,” the phrasing there made me think that was a frequent error that it would, you know, put cobblestones in it. And so I wonder how much of the edge case you’re programming you’re seeing is not to deal with places where it had encoded in its parameters incorrect information about the world.

Jason Lopatecki: Absolutely. I think I mean some more interesting things to be honest in that, in the code, and what they did there, and that, or at least in the release there’s a lot of magic in the prompts, which is funny. I spent a lot of time looking at those, and you know the like. Getting all this stuff to work is one thing, but you know some of the magic comes there. It’s probably the stuff we’ll be touching a lot day to day, too.

Michael Schiff: One other thing that I was thinking about while reading this was the Meta training and evaluation cycle. I started to think about the accumulation of the skill library as a training phase, and it’s the slower phase you’re going to make mistakes during the iterative prompting process. The code proposed as skills won’t be good enough, and you’re not going to be able to deal with tasks as they happen, because you don’t have skills for it yet. And they address a little bit of the generalization of those skills and the zero shot case. But where you take an agent that’s not trying to iteratively prompt and build skills. Jason, you brought it up a little bit. You have highlighted here, but just plugging an existing skill library into a more standard agent, which just receives tasks, decomposes them into sub tasks, and then goes to act on them, but it has this bank of things to act on. That would be kind of like the inference time for these learned agents.

I do wonder if those will be fast enough for real game playing soon. They have the code up. I wanted to get it, but I didn’t have time before this morning. I’m quite curious to see how it does in a real time setting.

Jason Lopatecki: Yeah, I thought this was fascinating like that. The fact that auto GPT with the skill library actually improved and improved it too.It’s a whole set of LLMs that just build skill libraries over time. It was fascinating to me.

Aparna Dhinakaran: One of the things they talked about on paper was cost. This is an expensive thing to run. Every step of the way there’s some LLM buying and checking and thinking and pulling and retrieving like this would be really expensive to actually run today.

Jason Lopatecki: I feel like everything we’re doing is like LLM spaghetti. I’m calling another LLM to check this LLM’s response and it doesn’t matter what you’re doing. You’re probably doing it all on spaghetti at some point. I don’t know if we ever get out of it.

Michael Schiff: It’s so early in all of this, I think there’s a lot of just trying stuff and seeing what works. And we’re beginning to see patterns and the use of vector, databases and indexing call plug structures via their settings is just gonna keep coming up in context, learning, I think, is just going to keep coming up. But I wonder how much we’re going to be able to leverage LLMs upfront and then remove them after the back. I think that’s kind of where I started to go with this like training time. You use the LLM to build up your skill library. But then, at playing time, can you remove a large amount of that cost by just grabbing pre pre-crafted skills. And it’s a much smaller natural language problem to decompose your input tasks into the sub tasks that you use to index your skill library? Can you bring that cost down to run on the edge?

Jason Lopatecki: From an observability perspective, what does observability look like? What are the pain points we think we’re going to have as we build this, any thoughts or insights?

Aparna Dhinakaran: I don’t know if you saw, but can you actually scroll down to paper limitations? Can we just read the section on hallucinations? I feel like the self verification was interesting because they’re kind of running a little bit of checks and then durability in the process before they kick off the actual tasks. So observability was almost like a part of the task process. The automatic curriculum occasionally proposes on tasks, for example, it may ask an agent to craft a copper sword or copper chest plate, which are items that do not exist within the game. Hallucinations also occur during the code generation process. For instance, GPT tends to use cobblestone as a fuel input, despite being an invalid fuel source in the game. Additionally, it may call functions absent in the provided control primitive APIs leading to code execution errors. We’re confident improvements in the Gp. API models as well as novel techniques for by and teaming up their cell, that models will overcome those limitations in the future.

This is really interesting, right? There’s kind of like a couple of different places where there’s hallucinations. It’s, you know, the agent level the code generation process and then, even in the control front of Api’s like there’s kind of like a couple of different places where they’re calling an L, and then that’s what it’s hallucinating and probably has some downstream impact. You know, if you just score at the end, like Minecraft, the tough one, because it’s not like this end goal that you’re kind of pushing towards. I don’t know. Maybe mining and diamond, I’m not sure but you know if there’s some way to almost score. Well, did it build the table. Would you consider this a good table, almost like scoring at the end of it? And you but then you want to work it trace it back down. To what part of the agent did that break in where it actually hallucinates? It’s a little bit more complex than just looking at the final result.

Jason Lopatecki: I think this is also what I kind of like, there is exactly that, like, there’s major problems right now, like major problems. I kind of see that I categorize things around this like one is, when are hallucinations we’ve seen? If you give it bad context. So you embed something, you return something, and you put that into the context. And it’s kind of wrong like that that tends to cause a lot of calls, but that’s still somewhat flexible like you can probably could fix not putting the right. You know as much as you can, or or you could improve that But then there’s like stuff we’re just, you know. I I don’t know if this cobblestone like I doubt they’re putting cobblestone into the contacts like it’s just things cobblestones, part of the game, or probably something you can use. And it’s like humans can make a small mistake and easily correct it. And I think we’re kind of having to work around where you can’t correct these tiny things that they have. You have to do it in the context window and you can’t do it. Glow globally. And you’ve got to remember in the context window to keep telling it. You can’t use cobblestone and maybe that’s the future of this. You just collect the small skills like that, or the small feedback notes like that. But I think tracking these, we have a whole, I’ll give a plug for the LamaIndex search and retrieval event we have tonight. we’re doing how to troubleshoot search and retrieval, how to evaluate it.

Probably some of the most groundbreaking work on analyzing, search and retrieval. at least visualizing it, troubleshooting it in real real world situations. We’ve done it on a bunch of production dates. We have some examples with real data and I think that this continues to be something that is, you know, is going to be a challenge. It’s going to be hard to understand. hard to track when it occurs, hard to give feedback on what to do to fix it. You’re right, it’s gonna continue to be like one of the bigger challenges here is tracking these down and helping troubleshoot when it occurs, and what you should do to modify.

Michael Schiff: I expect we’ll also see observability in the module. So what we’re calling one thing as voyager is actually the combination of several complex things. And so I would expect that you would have observability tools specific to code generator. They discuss a little bit of the prior work around Co generation and execution, techniques like intermediate execution and guided program search. So I I think there will be some classical techniques that are applied to the sub modules, new techniques applied to the search and retrieval components, probably specific observability. So like Minecraft specific stuff that is just you know task-specific. So I guess the word I was looking for.

Jason Lopatecki: I think you’re gonna see several different sounds of observability from a space like this. And we’re in the early stages, too. It’s just going to get more complex. I thought this was kind of interesting. Did you all catch this like, how could humans be in the loop? Where would you put a human in the loop in terms of controlling the success or outcome. This reminded me a little bit of the baby dev, or small dev sorry, small dev!

Michael Schiff: I find the phrase hallucinations interesting for natural language models. But I think these techniques are interesting because we’re using them in a sense creatively, to generate something from nothing. So hallucinations might actually be good for you in the generating of code space that is not hallucinations like making up API calls, but hallucinations like creative ideas that weren’t present. I think there is a thin line between, or a blurry line between a hallucination that’s wrong. And just what these models were trained to do. So I think that the idea that you could plug a human into either combine those outputs or to guide those outputs, I think, is highly realistic. Especially as you start to think of scale libraries as generated code and and the evolution of code bases as a thing that is a cooperative act between a human maintainer and a natural language model junior dev.

Jason Lopatecki: It’s hard to believe. I think like a year or so ago. We never thought it was possible. But it’s amazing. We’re seeing today. the human as a critic like that, you know, critiquing the responses, and being able to give that feedback that seems like one of the harder things to do to give that, you know feedback. It sounds like in some areas here where it can’t perceive the environment that well or has missing some, some ability to perceive the environment like that was critical. And then the curriculum like we were talking about in the beginning, that world model view that you kind of have to have in your model, and if you don’t have, maybe it’s not complete, or maybe it doesn’t know it, or it’s some you could potentially provide some. Some of that and you can imagine as these get bigger and more amazing, you could, it could. I think the vision is to be able to have them build their own world model, you know, dynamically. So or or add on to the world model they have. So I think that is interesting, like places where humans can come into the loop and actually improve, or fix.

Do you have any other things here? I mean, I thought that the results were pretty impressive. really, it was also impressive though they stated themselves there, there’s not really anything that they can compare it to directly. And so I think you have to take it at face value because they actually had to reinterpret, or in some cases fully re-implement the technique.

Michael Schiff: This method of embodying an agent that normally can only output language using code, I think, is fascinating, that code through an index skill library is the way you can take a language model and let it interact with the world.

Aparna Dhinakaran: It looks like we’re at time. Thank you so much for joining. And you know, let us know recommendations for paper recommendations for next week.

Share

Suggested reading

Text reads: The Illusion of Thinking Understanding the Strengths and lImitations of Reasoning Models via the Lens of Problem Complexity

The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning

Reads: Accurate KV Cache Quantization with Outlier Tokens Tracing

Accurate KV Cache Quantization with Outlier Tokens Tracing