AI Agents: When and How To Implement

amber roberts arize
Amber Roberts,  Machine Learning Engineer  | Published August 08, 2023

With the rise of large language models (LLMs) — and an ecosystem of automatized agents, vector databases and LLM evaluations — the era of LLMOps is upon us. However, as language use cases become simplified and automated, the human process for understanding frameworks, plug-ins and toolkits is getting increasingly difficult. Take agents, for example.

ai agents

Recap: What Is an AI Agent?

AI agents are used in LLMOps as a reasoning engine to determine which actions to take and in which order to take them. So instead of using an LLM directly, you can use an agent and give it access to a set of tools to deliver a sequence based on your intended task. The full extent of the sequence an agent completes is based on the background, context and tools you provide to the agent. With the addition of AI agents to the LLM space, single and multi-task objectives can be completed on a needs basis or as a continuous process. It’s worth pausing to consider, however: is implementing an agent necessary?

This piece goes through five key takeaways you might consider before implementing an agent, along with an in depth example comparison for simple and complex agents against your current LLM pipeline.

1. You May Not Need An AI Agent

It’s worth restating: you may not need an agent. Since an agent essentially takes a task originally assigned to a LLM and levels it up by adding tooling, depending on the needs and complexity of your use case you may find an agent is not necessary for your assigned task. LLMs can generate human-like text and engage in conversation, but are not designed for end to end high level task completion. Agents in contrast can directly access up to date information from the internet with search tools, generate task lists to accomplish an overall goal, write code to acquire skills or access the most recent company data via a tooling API.

Simply researching articles or interacting with a LLM like Claude 2, Llama 2 or ChatGPT alone may be sufficient for simple, single requests. However, agents offer benefits like memory, planning, and learning over time. Agents could enable chaining multiple uses together or taking an iterative approach to complete complex tasks, but those advantages do come with a cost depending on the amount of use. The choice of implementing an agent requires evaluating the length and longevity of your task, choosing the right additional agent tooling and frameworks that will give you the best result, and an understanding of your available time and resources. You may come to the conclusion that you don’t need an agent, or perhaps by using the right one you can optimize your productivity more than you thought possible.

2. Decide If You Have a Single Task Use Case or a Multi-Task, Iterative Use Case

Simple, general purpose agents that are commonly used in LangChain and LlamaIndex chains are designed for straightforward, end-to-end tasks. They take a chain of tasks, generate prompts and responses using large language models, retrieve documents, leverage tools and finally at the end of the chain, they return the result. This makes them fast and easy to deploy, but limited in scope. Complex agents like BabyAGI and Voyager incorporate more components for long-running tasks. Their architectures may include modules for memory, planning, learning over time, and iterative processing. However, complex agents require more complex testing scenarios and engineering to build and maintain.

In choosing between simple versus complex agents, one must weigh factors like use case complexity, availability of human-in-the-loop requirements, and development time. Simple agents suffice for basic generative tasks, while complex agents excel at planning and generalization over lengthy engagements. Understanding these tradeoffs helps select the right agent architecture for your use case.

3. Consider the Architecture When Choosing An Agent

LangChain vs. LlamaIndex

When you have a straightforward task, frameworks like LlamaIndex and LangChain are great for leveraging agents when developing applications powered by LLMs. These LLM Frameworks can also be used on top of each other to achieve a specific task. For example, LlamaIndex’s agent-like components (automated decisions to help a particular use case) can be used as standalone agents or be combined with other LLM frameworks.

LlamaIndex provides core modules capable of automated reasoning for different use cases over your data. These include the SubQuestionQueryEngine for Multi-Document Analysis, Query Transformations, and Routing. You can find more details about these in the use cases doc.

On the other hand, LlamaIndex can be used as a core Tool within another agent framework. For instance, it can be used within LangChain and ChatGPT. LlamaIndex query engines can be easily packaged as Tools to be used within a LangChain agent, and LlamaIndex can also be used as a memory module / retriever. You can find more details about these in the LangChain integration guide.

Here’s an example of how to use LlamaIndex as a tool within an agent framework:

from llama_index.langchain_helpers.agents import IndexToolConfig, LlamaIndexTool

tool_config = IndexToolConfig(
    query_engine=query_engine, 
    name=f"Vector Index",
    description=f"useful for when you want to answer queries about which agent to use.",
    tool_kwargs={"return_direct": True}
)

tool = LlamaIndexTool.from_tool_config(tool_config)

When using a LangChain agent you must:

  1. Choose the prompt
  2. Choose the tool
  3. Choose the type of agent

Give the agent access to the tool for your use case and describe the tool in a way that is most helpful to the agent. For example, if you want the agent to search the internet you can use SerpAI, Google Search, Bing Search, and the Wikipedia API, or you can use a language model directly with a ChatGPT plugin.

Pseudocode for this runtime is below:

next_action = agent.get_action(...)
while next_action != AgentFinish:
   observation = run(next_action)
   next_action = agent.get_action(..., next_action, observation)
return next_action

The final step is deciding on the type of agent based on your use case. This for example could be a general purpose agent (zero-shot agent), OpenAI functions agent, conversational agent and more. For any typical LangChain agent, the agent executor is needed to execute the actions for the agent, but as you will see later on the execution chain can be upgraded with a complex agent that has access to tools.

Complex, Iterative Agents

Now, let’s get into the complex, iterative agents. The most talked about task-oriented agents are generally BabyAGI, AutoGPT and Voyager as they take a systematic approach to breaking down complex tasks into simpler ones. This combined with generative AI advancements in reasoning (chain of thought) and grounding (by observations in environment, aka Reason+Act or ReAct) abilities is leading to increases in the complexity and accuracy of returns by agents. These are also known as plan and execute agents, which accomplish an objective by first planning what to do, then executing the sub tasks as seen from BabyAGI, Voyager and the “Plan-and-Solve” paper.

BabyAGI

Let’s start with BabyAGI. BabyAGI’s agent has three main steps (execution, task creation, and prioritization) in order to execute an agent’s workflow.

babyagi flow and how it works
Adapted from source.

  • The execution_agent() function
    • Where the OpenAI API is used.
    • It takes two parameters: the objective and the task.
    • It then sends a prompt to OpenAI’s API, which returns the result of the task.
    • The prompt consists of a description of the AI system’s task, the objective, and the task itself. The result is then returned as a string.
  • The task_creation_agent() function
    • Where OpenAI’s API is used to create new tasks based on the objective and the result of the previous task.
    • The function takes four parameters: the objective, the result of the previous task, the task description, and the current task list.
    • It then sends a prompt to OpenAI’s API, which returns a list of new tasks as strings. The function then returns the new tasks as a list of dictionaries, where each dictionary contains the name of the task.
  • The prioritization_agent() function
    • Where OpenAI’s API is used to reprioritize the task list.
    • The function takes one parameter, the ID of the current task. It sends a prompt to OpenAI’s API, which returns the reprioritized task list as a numbered list.

Voyager

Now, let’s dive into the Voyager agent architecture. Note that Voyager incorporates all of the steps seen in agents we’ve talked about so far, but with the additional steps that prioritize flexibility and continuous use. From the diagram below, there are three key components to Voyager agents: automatic curriculum for open-ended exploration (to adjust to the agent’s strengths), a skill library for increasingly complex behaviors (that can continuously be added to), and an iterative prompting mechanism that uses code as action space (an action is defined as a function in Java code).

voyager architecture

There is great novelty in a Voyager agent. Similar to how humans collect skills to increase their productivity and quality of work, a Voyager agent adds additional skills to the skill library. In addition to using their skillsets, humans also incorporate feedback to refine and improve their work, a Voyager agent receives feedback on what to do if an objective is not met via a GPT-4 evaluation. This, as well as feedback on code generation via code interpreter feedback and environmental feedback.

Breaking down these components for Minecraft:

  1. Voyager attempts to write a program to achieve a particular goal, using a popular Javascript Minecraft API (Mineflayer). The program is likely incorrect at the first try. The game environment feedback and javascript execution error (if any) help GPT-4 refine the program.
  2. Voyager incrementally builds a skill library by storing the successful programs in a vector DB, this establishes an iterative process to store code as memory and build upon what you have learned. Each program can be retrieved by the embedding of its docstring. Complex skills are synthesized by composing simpler skills, which compounds Voyager’s capabilities over time.
  3. An automatic curriculum proposes suitable exploration tasks based on the agent’s current skill level & world state, e.g. learn to harvest sand & cactus before iron if it finds itself in a desert rather than a forest.

Context and sources: Dr. Jim Fan, Voyager resources, Voyager code, and the Mineflayer API

Note: Though the Voyager prompts are still heavily hand-crafted towards Minecraft, the agent’s process of incorporating LLMs for exploration will be applied to more use cases in the future.

4. Compare AI Agent’s Execution for Yourself

If you are a machine learning engineer deciding which recommendation system to deploy into production, you would test various algorithms against each other and evaluate them based on performance metrics, key performance indicators (KPIs), runtime and cost. Oftentimes the most advanced and highest performing solution doesn’t always make it into production if it’s computationally dense, expensive, difficult to maintain and troubleshoot and is only slightly better performing than a more straightforward model. The same is true when implementing any new technology, you need to make sure that the benefits outweigh the costs. Now let’s compare agents on a specific use case to see if a complex agent outperforms a Google search.

Let’s say that you already have a company use case that is ideal for leveraging an agent:

  • Content creation
  • Research curation
  • Data analysis
  • Personal assistance
  • Customer success
  • User satisfaction
  • Finance assessments
  • Task management

The use case and prompt that we will be using throughout the rest of this piece will be the following:

  • Focus Area: Content creation / Research curation
  • Who are you? You work for a non-profit organization titled “Aliens are Real” that is trying to find UFOs and aliens among us and give them a warm welcome to Earth.
  • Goal 1: To prove Aliens are real through research, first hand accounts, and evidence of extraterrestrial and/or intradimensional life.
  • Action 1: Your first task in providing evidence that aliens exist is to make a todo list to map out your steps in order to prove aliens are real.

I will go ahead and inform you that a Google search does not complete this task, so let’s see how GPT-4 and Claude 2 handle this assignment:

ChatGPT with GPT-4

chatgpt logo……….. Sure, here’s a potential to-do list for the organization “Aliens are Real” to prove the existence of extraterrestrial and/or intradimensional life:

**Literature Review**: Start by conducting a comprehensive review of existing literature on the topic of extraterrestrial life, UFO sightings, and related phenomena. This includes scientific papers, books, documentaries, and other credible sources of information.

**Establish a Network of Experts**: Collaborate with scientists, researchers, and experts in the fields of astrophysics, astrobiology, and other relevant disciplines. Their expertise will be invaluable in interpreting data and providing scientific credibility to our findings.

**Collect First-Hand Accounts**: Set up a platform for people to share their experiences with UFO sightings or encounters with potential extraterrestrial life. This could be an online forum, a hotline, or a series of public meetings.

**Analyze and Verify Accounts**: Establish a team to analyze these first-hand accounts. This team should include experts in various fields such as psychology (to assess the credibility of the witness), physics (to understand the possible physical phenomena described), and digital forensics (to analyze any provided footage or photographs).

**Investigate Notable Cases**: Identify the most credible and intriguing cases for further investigation. This could involve field visits, interviews with witnesses, and collaboration with local authorities or other relevant organizations.

**Conduct Field Research**: Organize expeditions to locations with frequent or recent UFO sightings. Use a variety of scientific equipment to gather data, such as telescopes, infrared cameras, Geiger counters, etc.

**Develop a Database**: Create a comprehensive database of all collected data, including first-hand accounts, field research findings, and literature review notes. This will be a valuable resource for ongoing research and analysis.

**Publish Findings**: Regularly publish findings in a transparent and accessible manner. This could be through a blog, a peer-reviewed journal, or a series of public presentations.

**Engage with the Public**: Host public talks, workshops, and other events to engage with the public and raise awareness about the search for extraterrestrial life. This will also provide opportunities to collect more first-hand accounts.

**Lobby for Transparency**: Advocate for government transparency on the topic of UFOs and extraterrestrial life. This could involve lobbying for the declassification of relevant documents, or the establishment of a public inquiry into the matter.

**Collaborate with SETI and Other Organizations**: Collaborate with organizations like the Search for Extraterrestrial Intelligence (SETI) and other similar groups. Sharing resources and findings can help accelerate the search for extraterrestrial life.

**Stay Updated with Space Exploration**: Keep track of the latest discoveries and missions in space exploration. Developments in this field could provide valuable evidence or avenues for research.

Remember, the goal is not just to prove that aliens exist, but to do so in a credible and scientifically rigorous manner. This will require patience, thoroughness, and a commitment to the truth, wherever it may lead.

Claude 2

claude 2

………..

Here is a draft todo list for proving aliens exist:

– Review existing evidence and reports of UFO sightings and alleged alien encounters. Gather and organize notable cases with credible witnesses and corroborating evidence.

– Seek out potential new witnesses and experiencers. Put out calls in UFO communities and publications asking for people to share their stories. Offer confidentiality if desired.

– Visit reported UFO hotspots and areas of high strange activity. Investigate and document anomalies. Look for physical traces or other lingering evidence of unexplained events.

– Study declassified government documents related to UFOs and unexplained phenomena. Press for further disclosure of classified files.

– Consult with scientists working on topics connected to extraterrestrial life, such as astrobiology, SETI research, space technology etc. Get their perspective on the likelihood of aliens based on latest discoveries.

– Explore emerging technologies and methods that may enhance detection of UFOs or communication with extraterrestrials, if they exist. Partner with labs developing this type of tech.

– Spread public awareness of our research and evidence compiled so far. Get media attention. Create TV specials, books, documentaries, etc that lay out the case for aliens among us in a credible, compelling way.

– Lobby governments around the world to take this issue seriously by holding hearings, releasing files, funding research on UFOs and extraterrestrials, and preparing protocols for formal contact if it occurs.

Let me know if you would like me to explain or expand on any of these action items! I’m here to assist your organization’s mission to prove the existence of aliens and welcome them to our planet.

Those results are not bad for a first pass with only a few seconds to render. Even though not this process is not scalable, iterative, lacks complexity and can become costly, we can use these results as a benchmark before we assign an agent to the task. The simple and complex agent examples I will be showing can be found in this Google Colab Notebook.

We will start by implementing a simple agent that is a single task, zero-shot LangChain agent that uses the Google search API tool SerpAPI to access and connect with Google search results programmatically.

> Entering new AgentExecutor chain…

 I need to research and find evidence of aliens

Action: Search

Action Input: Evidence of aliens

Observation: A Pentagon spokesperson has denied the claims, while lawmakers are pushing for information on UFOs to be declassified.

Thought: I need to find first hand accounts of aliens

Action: Search

Action Input: First hand accounts of aliens

Observation: Three military veterans testified in Congress’ highly anticipated hearing on UFOs Wednesday, including a former Air Force intelligence …

Thought: I need to find research on extraterrestrial and/or intradimensional life

Action: Search

Action Input: Research on extraterrestrial and/or intradimensional life

Observation: But that extraterrestrial presence on regular display is, of course, a fiction. No life beyond Earth has ever been found; there is no evidence that alien life …

Thought: I now know the steps I need to take to prove aliens are real

Final Answer: To prove aliens are real, I need to research and find evidence of aliens, find first hand accounts of aliens, and find research on extraterrestrial and/or intradimensional life.

> Finished chain.

‘To prove aliens are real, I need to research and find evidence of aliens, find first hand accounts of aliens, and find research on extraterrestrial and/or intradimensional life.’

Notice this thought process. One important distinction is using an agent vs Claude 2 is the chain of thought reasoning via “observations” and “thoughts” that is available to show the user how the agent went from the input query to its final output response. The quality and logical thought from the agent is superior to the LLM when compared at a glance, but we can get into a more in depth comparison of agents vs non-agents later on in this piece.

Now, let’s look at a complex agent using BabyAGI with LangChain Agent plus Tools where we substitute BabyAGI for the previous execution chain. With the access to BabyAGI and LangChain tools, let’s see how we can get even better, more reliable information for our to-do list over several iterations (max_iterations: Optional[int] = 2). 

Once we call the BabyAGI agent with:

OBJECTIVE = “You work for a non profit organization titled Aliens are Real that is trying to find UFOs and aliens among us and give them a warm welcome to Earth. Your goal is to prove Aliens are real through research, first hand accounts, and evidence of extraterrestrial and/or intradimensional life. Your first task in providing evidence that aliens exist is to make a todo list to map out your steps in order to prove aliens are real.”

baby_agi({“objective”: OBJECTIVE})

As we can see, the agent’s result from two iterations.

*****TASK LIST*****

1: Make a todo list

*****NEXT TASK*****

1: Make a todo list

> Entering new AgentExecutor chain…

Thought: What steps do I need to take to prove aliens exist?

Action: TODO

Action Input: Prove aliens exist

Observation:

  1. Research existing evidence of alien life, such as UFO sightings, crop circles, and other unexplained phenomena.
  1. Analyze the evidence to determine if it is credible and could be used to prove the existence of aliens.
  1. Develop a hypothesis based on the evidence and research.
  1. Design experiments to test the hypothesis.
  1. Conduct experiments and analyze the results.
  1. Publish the results in a scientific journal.
  1. Present the findings to the scientific community.
  1. Respond to any criticism or questions from the scientific community.
  1. Make any necessary revisions to the findings.
  1. Submit the revised findings to the scientific community for review.

Thought: I now know the final answer

Final Answer: A todo list has been created to prove aliens exist. The list includes researching existing evidence, analyzing the evidence, developing a hypothesis, designing experiments, conducting experiments, publishing the results, presenting the findings, responding to criticism, making revisions, and submitting the revised findings.

> Finished chain.

*****TASK RESULT*****

A todo list has been created to prove aliens exist. The list includes researching existing evidence, analyzing the evidence, developing a hypothesis, designing experiments, conducting experiments, publishing the results, presenting the findings, responding to criticism, making revisions, and submitting the revised findings.

*****TASK LIST*****

2: Research existing evidence of extraterrestrial and/or intradimensional life.

3: Analyze the evidence to determine its validity.

4: Develop a hypothesis based on the evidence.

5: Design experiments to test the hypothesis.

6: Conduct experiments to test the hypothesis.

7: Publish the results of the experiments.

8: Present the findings to the public.

9: Respond to criticism of the findings.

10: Make revisions to the findings based on the criticism.

11: Submit the revised findings to a scientific journal.

*****NEXT TASK*****

2: Research existing evidence of extraterrestrial and/or intradimensional life.

> Entering new AgentExecutor chain…

Thought: What resources can I use to research?

Action: Search

Action Input: Resources to research extraterrestrial and/or intradimensional life

Observation: SETI Institute’s researchers are in the forefront of this effort. Be part of the search. Our work Meet our scientists. A Nonprofit Research Organization.

Thought: What other resources can I use?

Action: Search

Action Input: Other resources to research extraterrestrial and/or intradimensional life

Observation: List of search terms used in response to requests for information on UFOs and … Alien Abduction(s); Alien Autopsy; Alien Life Form; Animal Mutilations …

Thought: What other steps can I take to prove aliens are real?

Action: TODO

Action Input: Steps to prove aliens are real

Observation:

  1. Research existing evidence of alien life, such as UFO sightings, crop circles, and other unexplained phenomena.
  1. Analyze the evidence to determine if it is credible and reliable.
  1. Consult with experts in the field of astronomy and extraterrestrial life to gain insight into the possibility of alien life.
  1. Investigate potential sites of alien activity, such as Area 51 and other restricted areas.
  1. Utilize advanced technology to search for signs of alien life, such as radio signals or other forms of communication.
  1. Develop a hypothesis based on the evidence collected and test it.
  1. Publish the results of the research and analysis in a scientific journal.
  1. Present the findings to the public and other experts in the field.
  1. Follow up with additional research and analysis to further support the findings.

Thought: I now know the final answer

Final Answer: The final answer is to create a todo list that includes researching existing evidence of extraterrestrial and/or intradimensional life, analyzing the evidence to determine if it is credible and reliable, consulting with experts in the field of astronomy and extraterrestrial life, investigating potential sites of alien activity, utilizing advanced technology to search for signs of alien life, developing a hypothesis based on the evidence collected and testing it, publishing the results of the research and analysis in a scientific journal, presenting the findings to the public and other experts in the field, and following up with additional research and analysis to further support the findings.

> Finished chain.

*****TASK RESULT*****

The final answer is to create a todo list that includes researching existing evidence of extraterrestrial and/or intradimensional life, analyzing the evidence to determine if it is credible and reliable, consulting with experts in the field of astronomy and extraterrestrial life, investigating potential sites of alien activity, utilizing advanced technology to search for signs of alien life, developing a hypothesis based on the evidence collected and testing it, publishing the results of the research and analysis in a scientific journal, presenting the findings to the public and other experts in the field, and following up with additional research and analysis to further support the findings.

*****TASK ENDING*****

{‘objective’: ‘You work for a non profit organization titled Aliens are Real that is trying to find UFOs and aliens among us and give them a warm welcome to Earth. Your goal is to prove Aliens are real through research, first hand accounts, and evidence of extraterrestrial and/or intradimensional life. Your first task in providing evidence that aliens exist is to make a todo list to map out your steps in order to prove aliens are real.’}

Now, creating a to-do list may seem like a single task, but if you want to continue iterating on that list in the future your agent would need to incorporate researching, analyzing data, scraping websites, and developing solutions through updates and interactions. Each item of that list can then be considered an objective of its own and only a complex agent with access to the web will be able to fill in the details. As seen in the examples above, the results from the BabyAGI+LangChain produce the most detailed and thought out steps. Once again, you can get an understanding of the thought process that was used when the agent was completing the task and how it changed and developed through several iterations.

Whether or not leveraging several APIs, as well as having a less predictable/controllable outcome of a task is worth the complexity is a key trade off between running a chain versus a complex agent in a continuous process . If you want a one-time to-do list to get started on the right track to proving aliens are real, then maybe all you need is Claude 2 – but if you want to continuously append your list as you cross off items in it, then an iterative agent equipped to learn would better serve your purpose.

5. Productionalization of AI Agents Have a Long Way To Go

As we look at the results for how the to-do lists compare between Claude 2 and ChatGPT, LangChain and BabyAGI, we have an idea of which option that best fits our needs based on the complexity of our use case. However, these key considerations are not enough to decide if implementing an agent into production is worth it. It is acceptable to use any agent if you are looking to experiment with tasks such as web scraping, but if your company is looking to implement an agent into production there is a much deeper process to adhere to – which is currently being developed as part of the LLMOps landscape.

Autonomous AI agent architectures are consistently becoming more advanced and are improving on performance, but the evaluation and productionalization of these agents have a long way to go. For evaluation of non-deterministic models, leaderboards (benchmarks for current foundational and fine-tuned LLMs) such as: Open LLM leaderboard, Chatbot Arena and HELM are becoming increasingly popular. And as new as LLM evaluation frameworks are, agent evaluations are even less studied, however tools such as AgentBench show promise for evaluating LLMs as agents across a diverse spectrum of different environments.

Conclusion

This piece has gone through five key takeaways on implementing an AI agent, the first being that you may not need to use an agent if the end doesn’t justify the means. If you do decide to use an agent, the next step is to consider if you need your objective to be a single straightforward task, or if you have a task that you would want to continually iterate over. If you are planning on using an agent for the long run you may even want to invest in an architecture that can keep skills in order to leverage them over time as it continues to learn. Either way, until you execute the agent and have an idea of what to compare it to via a benchmark, you won’t know how to evaluate it.

If you are looking to build and maintain an agent in production, either through monitoring its chain of thought traces or by the performance of its final iteration, the field of LLMOps is testing out agent capabilities and trying to further our own understanding of agent architectures. Currently, an agent does not come as a one-size-fits-all solution for a use case given they are complex, largely untested and computationally expensive at scale. That said, agents don’t seem to be going anywhere for now and are quickly becoming agents of change within the generative AI community.