Agent Architectures

Published August 27, 2024

Co-authored by Aparna Dhinakaran

Intro on our Agents Series

If 2023 was the year of RAG, 2024 has been the year of agents. Companies all over the world are experimenting with chatbot agents, tools like MultiOn have grown by connecting agents to outside websites, and frameworks like LangGraph and LlamaIndex Workflows are helping developers around the world build structured agents.

However, despite their popularity, agents have yet to make a strong splash outside of the AI ecosystem. Very few agents have taken off among either consumer or enterprise users.

With this in mind, we decided it was time to help teams navigate the new frameworks and new agent directions? What tools are available, and which should you use to build your next application? How can you evaluate and improve your agent?

Our other motivation behind this series is that our team recently built our own complex agent to act as a copilot within our Arize platform. We took a TON of learnings away from this process, and now feel more qualified to offer our opinion on the current state of AI agents.

In this 6-part series, we’ll deep dive on each of these topics and more. We’re hopeful that this series can help arm AI engineers everywhere to build the best agents possible.

With all that said, let’s jump into our first topic:

Part 1: Agent Architecture

Before we can turn to the advanced questions like evaluating agents and comparing frameworks, let’s examine the current state of agent architectures.

To align ourselves before we jump in, it helps to define what we mean by an agent. LLM-based agents are software systems that string together multiple processing steps, including calls to LLMs, in order to achieve a desired end result. Agents typically have some amount of conditional logic or decision-making capabilities, as well as a working memory they can access between steps.

In this first post, we’ll deep dive into how agents are built today, the current problems with modern agents, and some initial solutions.

The Failure of ReAct Agents

Let’s be honest, the idea of an Agent isn’t new. There were countless agents launched on AI Twitter over the last year claiming amazing feats of intelligence. This first generation were mainly ReAct (reason, act) agents. They were designed to abstract as much as possible, and promised a wide set of outcomes.

Unfortunately, this 1st generation of agent architectures really struggled. Their heavy abstraction made them hard to use, and despite their lofty promises, they turned out to not do much of anything.

In reaction to this, many people began to rethink how agents should be structured. In the past year we’ve seen great advances, now leading us into the next generation of agents.

What Separates This Second Generation of Agents?

This new generation of agents is built on the principle of defining the possible paths an agent can take in a much more rigid fashion, instead of the open-ended nature of ReAct. Whether agents use a framework or not, we have seen a trend towards smaller solution spaces – aka a reduction in the possible things each agent can do. A smaller solution space means an easier to define agent, which often leads to a more powerful agent.

This second generation covers many different types of agents, however it’s worth noting that most of the agents or assistants we see today are written in code without frameworks, have an LLM router stage, and process data in iterative loops.

What Makes Up An Agent?

Many agents have a node or component we call a router, that decides which step the agent should take next. In our assistant we have multiple fairly complex router nodes. The term router normally refers to an LLM or classifier making an intent decision of what path to take. An agent may return to this router continuously as they progress through their execution, each time bringing some updated information. The router will take that information, combine it with its existing knowledge of the possible next steps, and choose the next action to take.

The router itself is sometimes powered by a call to an LLM. Most popular LLMs at this point support function calling, where they can choose a component to call from a JSON dictionary of function definitions. This ability makes the routing step easy to initially set up. As we’ll see later however, the router is often the step that needs the most improvement in an agent, so this ease of setup can bely the complexity under the surface.

Each action an agent can take is typically represented by a component. Components are blocks of code that accomplish a specific small task. These could call an LLM, or make multiple LLM calls, make an internal API call, or just run some sort of application code. These go by different names in different frameworks. In LangGraph, these are nodes. In LlamaIndex Workflows, they’re known as steps. Once the component completes its work, it may return to the router, or move to other decision components.

Depending on the complexity of your agent, it can be helpful to group components together as execution branches or skills. Say you have a customer service chatbot agent. One of the things this agent can do is check the shipping status of an order. To functionally do that, the agent needs to extract an order id from the user’s query, create an api call to a backend system, make that api, parse the results, and generate a response. Each of those steps may be a component, and they can be grouped into the “Check shipping status” skill.

Finally, many agents will track a shared state or memory as they execute. This allows agents to more easily pass context between various components.

Agents Architecture Examples

There are some common patterns we see across agent deployments today. We’ll walk through an overview of all of those architectures in the following pieces but the below examples are probably the most common.

In its simplest form an agent or assistant might just be defined with a LLM router and a tool call. We call this first example a single router with functions. We have a single router, that could be an LLM call, a classifier call, or just plain code, that directs and orchestrates which function to call. The idea is that the router can decide which tool or functional call to invoke based on input from the system. The single router comes from the fact that we are using only 1 router in this architecture.

A slightly more complicated assistant we see is a single router with skills. In this case, rather than calling a simple tooling or function call, the router can call a more complex workflow or skill set that might include many components and is an overall deeper set of chained actions. These components (LLM, API, tooling, RAG, and code calls) can be looped and chained to form a skill.

This is probably the most common architecture from advanced LLM application teams in production today that we see.

example flow of agent with single router with simple function calling

The general architecture gets more complicated by mixing branches of LLM calls with tools and state. In this next case, the router decides which of its skills (denoted in red) to call to answer the user’s question. It may update the shared state based on this question as well. Each skill may also access the shared state, and could involve one or more LLM calls of its own to retrieve a response to the user.

This is still generally straightforward, however, agents are usually far more complex. As agents become more complicated, you start to see frameworks built to try and reduce that complexity.

Agent Architecture Frameworks

LangGraph

One framework is LangGraph. LangGraph builds on the pre-existing concept of a Pregel graph, but translates it over to agents. In LangGraph, you define nodes and edges that your agent can travel along. While it is possible to define a router node in LangGraph, it is usually unnecessary unless you’re working with multi-agent applications. Instead, the same conditional logic that could live in the router now lives in the Nodes and Conditional Edges objects that LangGraph introduces.

Here’s an example of a LangGraph agent that can either respond to a user’s greeting, or perform some sort of RAG lookup of information:

Here, the routing logic instead lives within nodes and conditional edges that choose to move the user between different nodes depending on a function response. In this case, is_greeting and check_rag_response are conditional edges. Defining one of these edges looks like this:

graph.add_conditional_edges("classify_input", is_greeting, {True: "handle_greeting", False: "handle_RAG"})

Instead of collecting all of the routing logic in one node, we instead spread it between the relevant edges. This can be helpful, especially when you need to impose a predefined structure on your agent, and want to keep individual pieces of logic separated.

LlamaIndex Workflows

Other frameworks like LlamaIndex Workflows take a different approach, instead using events and event listeners to move between nodes. Like LangGraph, Workflows don’t necessarily need a routing node to handle the conditional logic of an agent. Instead, Workflows rely on individual nodes, or steps as they call them, to handle incoming events, and broadcast outgoing events to be handled by other steps. This results in the majority of Workflows logic being handled within each step, as opposed to within both steps and nodes.

A reflective SQL generation agent as a LlamaIndex Workflow

CrewAI, Autogen, Swarm, and Others

There are other frameworks like CrewAI that are intended to make agent development easier, including some that specialize in handling groups of agents working together. We’ll dive more into these in our later posts.

Should you use a framework to develop your agent?

Regardless of the framework you use, the additional structure provided by these tools can be extremely helpful especially in building out agent applications. The question of whether using one of these frameworks is beneficial when creating larger, more complicated applications is a bit more interesting.

We have a fairly strong opinion in this area because we’ve built an assistant ourselves. Our assistant uses a complex, multi-layer router architecture with branches and steps that echo some of the abstractions of the current frameworks. We started building our assistant long before LangGraph was stable and it exists at a complexity level beyond anything that we’ve seen built with the current frameworks. We constantly ask ourselves, if we were starting from scratch, would we use the current framework abstractions? Are they up to the task?

The current answer from us is not yet. There is just too much complexity in the overall system that doesn’t lend itself to a Pregel-based architecture. If you squint at our assistant, you can map it to nodes and edges but the software abstraction for us would get in the way. As it stands, our team tends to prefer code over frameworks.

We do however, see the value in the agent framework approaches. Namely it does force an architecture that has some best practices and good tooling. We also do think they are getting better constantly, expanding where they are useful and what you can do with them. It is very likely that our answer may change in the near future as these frameworks improve.

Why Do You Need An Agent?

As you can see, agents cover a broad range of systems. There is so much hype about what is “agentic” these days. How can you decide whether you actually need an agent? What types of applications even require an agent?

For us, it boils down to three criteria:

Does your application follow an iterative flow based on incoming data?
Does your application need to adapt and follow different flows based on previously taken actions or feedback along the way?
Is there a state space of actions that can be taken? The state space can be traversed in a variety of ways, and is not just restricted to linear pathways.

Common Issues with Agents Today

Agents powered by LLMs hold incredible potential, yet they often struggle with several common pitfalls that limit their effectiveness.

One major challenge is long-term planning. While these models are powerful, they often falter when it comes to breaking down complex tasks into manageable steps. The process of task decomposition can be daunting, leading agents to get stuck in loops, unable to progress beyond certain points. This lack of foresight means they can lose track of the overall goal, requiring manual correction to get back on course.

Another issue is the vastness of the solution space. The sheer number of possible actions and paths an agent can take makes it difficult to consistently achieve reliable outcomes. As a result, the results can be inconsistent, leading to unpredictability in performance. This inconsistency not only undermines the reliability of agents but also makes them expensive to run, as more resources are required to achieve the desired results.

It is worth mentioning that current agent trends have pushed the market further towards constrained agents that can only choose from a set of possible actions, effectively limiting the solution space.

Finally, agents often struggle with malformed tooling calls. When faced with poorly structured or incorrect inputs, they can’t easily recover, leading to failures that require human intervention. These challenges highlight the need for more robust solutions that can handle the complexities of real-world applications while delivering reliable, cost-effective results.

Addressing these Challenges

One of the most effective strategies is to map the solution space beforehand. By thoroughly understanding and defining the range of possible actions and outcomes, you can reduce the ambiguity that agents often struggle with. This preemptive mapping helps guide the agent’s decision-making process, narrowing down the solution space to more manageable and relevant options.

Incorporating domain and business heuristics into the agent’s guidance system is another powerful approach. By embedding specific rules and guidelines related to the business or application, you can provide the agent with the context it needs to make better decisions. These heuristics act as a compass, steering the agent away from less relevant paths and towards more appropriate solutions, which is particularly useful in complex or specialized domains.

Being explicit about action intentions is also crucial. Clearly defining what each action is intended to accomplish ensures that the agent follows a logical and coherent path. This explicitness helps prevent the agent from getting stuck in loops or deviating from its goals, leading to more consistent and reliable performance. Modern frameworks for agent development luckily encourage this type of strict definition.

Creating a repeatable process is another key strategy. By standardizing the steps and methodologies that agents follow, you can ensure that they perform tasks consistently across different scenarios. This repeatability not only enhances reliability but also makes it easier to identify and correct errors when they occur.

Finally, orchestrating with code and more reliable methods rather than relying solely on LLM planning can dramatically improve agent performance. This involves swapping your LLM router for a code-based router where possible. By using code-based orchestration, you can implement more deterministic and controllable processes, reducing the unpredictability that often comes with LLM-based planning.

Looking forward

Throughout the remainder of this series, we’ll dive into these mitigation strategies. In our next installment, we’ll examine popular agent frameworks like LangGraph and LlamaIndex Workflows in detail. From there, we’ll turn towards strategies for task decomposition, function calling, and intent routing. Finally, we’ll deep dive into how you can use evaluation techniques to improve agent performance, and keep your agent from looping itself into a frenzy.

Hopefully you’ve enjoyed this part one, and we’ll see you soon for part 2!

Arize AX

Learn

Insights