Arize:Observe 2023

Arize:Observe – Future of Agents with Harrison Chase, Founder of LangChain

Aparna Dhinakaran speaks with Harrison Chase, Founder and CEO of LangChain. This talk was originally recorded at Arize:Observe.

Aparna Dhinakaran: Hey everyone super excited to welcome you to another amazing session uh my name is Aparna one of the founders here at Arize and today we're lucky to have the founder of link chain here with us to do a little fireside chat uh Harrison would you mind introducing yourself

Harrison Chase: absolutely thanks thanks for having me by the way excited excited to be here my name is Harrison I'm the the co-founder and CEO of LangChain which is now a company which is founded around the open source packages called LangChain that are used for developing language model applications


Aparna: Awesome and you know the GitHub repo is so popular that everyone's probably heard of it but just in case for for new folks who haven't yet heard of LangChain could you walk us through a little bit of why did you decide to build LangChain what is LangChain used for? Agents? What is that, it's so popular right now. Could you just walk us through a little bit of that journey for you?

Harrison: Yeah absolutely so LangChain is a framework for developing language model applications and we started off as a Python package and now we have a typescript package as well and so what exactly does a framework for developing language model applications mean?

I'd say there there's two main components and both of them have the same origin story which was I was talking with a bunch of friends who were building stuff in the space. I saw a lot of kind of like repeated patterns that they were doing and I decided to like factor those out and put those in kind of like an open source Python package at the time. And so a lot of those patterns fall into two categories and these are like the two major value props of LangChain that I see one is like the individual kind of like components for everything that makes up these really like applications that are based around language models so you know at the at the center you've got like language models themselves but then you've got other types of models as well like like chat models and embedding models and then you've got prompts which are now these new ways of programming them. You've now got these vector stores which are very important for memory you've got a lot of utilities for loading documents uh chunking documents stuff like that.
And so I saw basically everyone was using these components but they were slightly different maybe they were using like a cohere model versus an Open AI model versus a Hugging Face model and so what we've added is a lot of like standard kind of like interfaces for all these different components so you can easily kind of like swap between them and then we also have a bunch of like integrations slash implementations depending on the components so for like language models, we integrate with a bunch of things. For document loaders we have a bunch of implementations ourselves so it kind of depends on those but basically we have all these components, standard interface, lots of Integrations or implementations.

The other value prop and the other thing that's in LangChain is kind of like a way of taking these components and treating them like Lego blocks and building up to more complex kind of chains and so chains are basically sequent or yeah they're assembling these components to do specific things.

So for example, we have like a question answering chain which first there's like an ingestion step and that uses document loaders to load documents text Splitters to split the documents embeddings models to create embeddings Vector stores store them and then it's got a prompt um that that does a retrieval step to this embedding store puts the retrieved documents in the prompt calls the language model gets back an answer and so it's basically this sequence of things and so there's a lot of these different sequences um a lot of the more um popular ones um kind of had two flavors. One was like connecting to external data sources so I saw a lot of people doing kind of like chat with your documents um chat with your SQL tables stuff like that. So we have a lot of those types of chains and then the other the other flavor that I saw was a lot of what I would call like agentic chains and so these are chains that uh use the language model as a reasoning engine to decide actually what to do so rather than hard to code specific like do a then B then C then D you let the language model choose what order to do things in.

And this is really useful because I mean if even if you take a simple example like like conversing with a SQL table you know there's a simple chain where you take a natural a human human question convert it to a SQL query run that against the table get back an answer convert that back in but what if what if the queries wrong and Incorrect and raises an error? What do you do then, what if you need to make multiple queries? And so so I think chains are a really good place to get started because they're a predetermined sequence of steps so they're easy to understand they're easy to reason about but when you want to do complex things like recover from errors or handle kind of like more complex tasks that's when kind of like Agents come in. And a lot of the initial things that I saw people doing had some flavor of both of these and so a lot of the chains and agents and Link Chain are really centered around those two ideas.

Aparna: That is super helpful context. Outside of is outside of where you see the usage happening today which is you know connecting them for more of these question answer or chatbot type of experiences and chaining the other prompts, where do you see– I mean this last week has been insane for autonomous agents and the agent simulations and and all that um where do you see kind of you know maybe help help break it down. Where's kind of the world of today like the use cases that you feel like most people are going to use agents for today and then where could it go with you know sci-fi style yeah or even just like you know six months that's where we're at that's how fast the space is going but six months out like where do you see kind of more of these autonomous agent trends going…


Harrison: yeah I mean I think today a lot of the um a lot of the agents unless they're really narrow focused or kind of like more cool Twitter demos than anything else to be honest um I think they work in some cases they don't work in other cases they kind of go off the rails a bunch uh I think it is possible to have very focused agents for specific use cases today um and so I think they're but but I think that requires like really um really good prompt engineering of the prompt really careful selection of the tools um and uh some there's a good kind of like agent executor or something like that and so I think I've seen like good examples of that like really there's there's one that's like I think focused on kind of like code execution that that's decently well done there's a few that are focused on kind of like retrieval um so retrieval you can do in a standard way or you can use an agent to do it and so I think that's a bit more um uh practical today and then there's other ones again around the idea of like giving it a very limited set of tools a very carefully selected prompt I think what we'll start to see


Aparna: Just double clicking on the retrieval one could you explain just you know why what's the advantages of someone wrapping it in an agent versus just just calling it directly like where do you see the pros cons there?


Harrison: I think the agent just allows for handling of like more edge cases so if you think about like a typical retrieval pipeline um like you get a question that comes in you retrieve the relevant documents by doing like similarity search in a vector store you put those into the context and you pass that back to the model but you know like there's um one that kind of like hard codes what retrieval mechanism you're using and so it could be the case that you might want to do different types of retrieval based on the query that comes in um and so specifically one idea that like we're playing around with is like you know adding kind of like filters to the metadata based on the query um and then there's and then and then once you do that like you can do that and you can hard code that but then what if you add like two stringent of filters and they like break or they they return zero documents what do you do then and so I think having the um and and similarly like maybe the user question comes in but you don't want to look up the whole user query you only want to look up like selected terms or you want to do like multiple lookups and so basically like having you know just just like when we do retreat level of documents it's a really complex process and so like maybe like similarity search covers like 60 or 80 or something like that but there's a lot of edge cases it doesn't and so it's basically like yeah taking care of the long tail of edge cases there.


Aparna: got it got it got it okay that makes sense um and you were headed towards where do you see this going?


Harrison: So I think there's probably like two places where it's going one is like more generic tool usage so having you know humans are pretty good like zero shot Learners and that we can like get some instruction and figure out how to do it and so like I think having agents better at that and kind of tied into that although maybe also separate.


I think the idea of like long-term memory is really interesting so having agents and you start to see this with one of the papers that came out this week which is a bit before the event since we're recording but was it is the generative agents or Westworld paper and that had a really interesting long-term memory system that did some like time weighted and importance weighted retrieval step so already you've got a much more complex retrieval system than just semantic search and then on top of that it added in a reflection step um which basically you know like when I mean to simulate how our memory kind of like works like we don't remember every single event that happens unless you've got a really good memory but we kind of like build up some intuition for like over time like you know I don't like the taste of tomatoes or something like that, rather than oh I had tomato at time X, tomato at time Y tomato at time C, and I didn't like the taste of it. We kind of like condensed that into information and so I think that's a really interesting step in this idea of like more long-term memory that learns and so I think we'll start to see more of that as well.


Aparna: well of course there's also that use case that of course we're both excited about which is around LLM-assisted evaluation and using agents to be able to do that could you talk a little bit about about that and kind of where you see that going?


Harrison: yeah I think the idea um I mean I think there's a big pain point that this is solving which is like for all these generative applications it's really hard to evaluate them. And that's because you aren't producing like a single number that you can do like MSE on or accuracy or something like that, you've now got like these um I mean at the very least you've got like a natural language response. You’ve also possibly got some like intermediate steps that an agent has taken or something like that and so there's some heuristics you can do there like you know you could have some ground truth. And for like the intermediate steps like you know like did they line up to the ground truth but there's more than one way of doing things right and so like and for final answers there's definitely more than one way right the more than one way of of expressing a correct final answer and so you know the the way to do this up until now has been to ask humans to kind of like evaluate it like is this answer correct? Just like a teacher creating quizzes or tests or something um but now these language models are really good and they're starting to approximate humans in a lot of tasks and so why not use them to approximate evaluation. So I think that's a that's an area that yeah I mean we're both extremely excited about I think is is getting more and more feasible there's been a lot of really cool research by folks that like Open AI and Anthropic around around doing this um in pretty complex ways um and yeah I mean I think not only is it a cool research topic but it's very like applied and very practical because again there's no really great way to evaluate these things.

Aparna: I think part of the reason why we're you know we have that integration out between Arize and LangChain for instance to be able to you know log all of those let's say LLM-assisted evaluations use agents to call them um and then log all them to to visualize and troubleshoot and do that kind of hand in hand.


Harrison yeah and I mean I think going back to agents for a second and why they're not kind of like used more widely or why they're not production ready. It's really hard to understand what's going on inside them and so I think Integrations like the the Arize LangChain one are really helpful for that because they help kind of like shine a flashlight into that dark thing and there's a lot of different things that you might want to look at like the different steps the the kind of like inputs and outputs and their their space and or their position in the embedding space and stuff like that and so uh yeah I think um it's the yeah the the overhang for putting these things in production isn't really for most cases I don't really think it's latency for most cases I think it's more just like these things aren't good enough and they're not good enough because it's hard to debug them and hard to get them ready and so yeah I think Integrations like the Arize links no one are really helpful for for that.
Aparna: yeah that’s actually my next question…how does observability around agents you know I know we've been kind of thinking about okay what are the required components every single part of the chain gets logged um but yeah how do you think about troubleshooting agents or have you seen any users hit any bumps and how have you seen what are the ways you've seen them fix those problems?


Harrison: yeah I mean I think the main way that I've seen and it's not going to be super sorry for my calendar going off um the the main way that I've seen it is not super satisfying um but I think it's I think it's quite good is is just looking at the inputs and outputs and I remember talking to someone a few months ago who's been building with language models for at least a year if not like more than that, and I asked him what they did for evaluation and whether they had any metrics and he was like no I just like look at them. And that's totally fine with me I've built up an intuition for what's going on for what works and doesn't work um you know maybe and and I think I am very bullish on LLM- assisted evaluation to be clear, but I also think that in the short term there is no substitute for just like looking at the inputs and outputs and building up an intuition. So even if shining the flashlight just means logging those and looking at those I think that's I think that's good enough.


Aparna: yeah it's like the table Stakes to move forward


Harrison: Exactly you can get to the next stage.


Aparna: I'm going change gears a little bit here uh space is moving super fast so your answer might be outdated in a week but any differences you would comment around kind of the react style approaches to task controllers versus maybe like a baby AGI style controller that does the the outer loop of task optimization like any yeah anything you would comment there?


Harrison: yeah I think there's um I just wrote a blog post on this so it's very top of mind um I think there's okay so so I think one of the major differences that causes some of the smaller differences is in kind of the tasks that they're asked to execute so a lot of the react style agents are more like focused on like answering specific questions where maybe there's like one or two or three or four tool lookups maybe Max but they're kind of like very focused in in their in their in their objectives the BabyAGI and auto GPT things are much more open-ended. It's like increase my like wealth or something like that right it's not like a question answer it's kind of like a long-running thing. And so I think that leads to like um uh two differences that I noticed um so both it BabyAGI and auto GPT do something that the LangChain react agents don't, which is they do more kind of like clever and long-term memorization of intermediate steps. So the react agents with the intermediate steps they just keep a buffer of those in memory or they just keep a list of those in memory and they always put that in, but as soon as you start doing a lot of steps that'll like over that'll overflow the context window. That hasn't really been an issue again because of the difference in objectives, but that would be an issue for the auto GPT and BabyAGI thing and so they use this concept of long-term memory which we have in Lane chain but mostly for like human uh to AI interactions they use that for AI to Tool interactions um and so they have that as a separate component.


And then the second thing which is actually actually I think only baby AGI does this and I think this is a little bit what you were getting at they have a distinct kind of like planning and execution step where first they list out like five like however many tasks they want to do then they pick the first one they go execute it they get an observation they come back and then they update their task list. If you compare this to the react style thing the react thing uh thinks about what to do one step then does it and then think about the next thing and then. I think this um and I think the difference is again due to the objectives like react styles objectives are more focused and more kind of like directed so you can kind of do that one at a time thing. The BabyAGI I thing is much more open-ended so it's really helpful to separate it out, keep a memory of it not forget about it um and kind of like guide it in the long term.


Aparna: As you're talking I'm like wow the concept of troubleshooting the more complex baby AGI style is going to be way harder than just the hard coded react style type of controllers um but also super interesting what different types of problems each one of them may may run into yeah yeah no I think um I think they're I mean and there's there's different the thing that I like a lot about the BabyAGI and auto GPT just at like a personal level is like the interface isn't really chat anymore it's some like interesting thing where there is human in the loop feedback and you can like yes know certain actions, but like I mean there's been a lot of chat about different UIs and and everything's kind of come to chat at the moment, but there's got to be something else right and so that's what I like about the BabyAGI and auto GPT from again purely just like you know just like personal level.


Aparna: Well I think Industries it's okay it's the great actually segue into my very last question right there which is, what industries do you think agents will have the largest impact or most immediate impact in terms of where it will actually be deployed and used? Do you think it's likely to create Industries we don't even have yet today um open-ended but I would love your take on it
Harrison: I mean maybe my hottest take here is that everything that is like uh API call or a chain should really be an agent and again going back to just the ability to handle errors to to kind of like handle edge cases to handle multi-step things like everything that we think of as like an API or chain should really be an agent um and that's probably my hottest take there I think in terms of like specific Industries like I do think there will be um a lot of existing kind of like players that can uh adapt and use these pretty easily um and get started with them and I think that's a lot of the power of the of the foundation models that are coming out um and I think it's pretty much like every industry I don't know if there's like any in particular um I think I mean you know definitely ones where there are like a lot of text or like the primary ones to start so you saw a lot of like uh companies focused on like marketing pop-up and then I think like um there's they're starting to be more around like legal and stuff like that because there's a large concentration of textual documents which are good um but I think at the I think RPA is a really interesting one this gets very closely tied into agents and approximating that um so but but at the end of the day like if you think about it I think there are probably areas of most Industries where humans do kind of like wrote or routine things that could be approximated by agents and I think we'll start to see more and more of that


Aparna: Awesome awesome well Harrison thank you so much for being a part of Arize Observe I feel like I personally learned a lot and I'm sure but folks in the audience probably uh you know really appreciated just how how in-depth a lot of your answers and kind of your takes were and and even your hot takes so I really appreciate you kind of being on this any last words for for the crew here


Harrison: I really appreciate you guys having me I really appreciate the Arize LangChain integration. I know that got shown off uh yesterday or something like that but I really do think that…obviously I love agents right and obviously like they're not in production yet and I think a lot of that is due to the fact that it's really hard to build them and there is troubleshooting and kind of like fixing and observability that's needed and so I think I think the integration is awesome and uh yeah really appreciate really really appreciate the fact that's in there.

Aparna: Awesome well thank you so much and uh looking forward to everyone else to the next session we have coming up at Observe.

Subscribe to our resources and blogs

Subscribe