LLM Agents: How To Set Up

amber roberts arize
Amber Roberts,  Machine Learning Engineer  | Published August 08, 2023

Can an agent resolve whether extraterrestrial life exists?

Follow along with the accompanying notebook.

The rapid adoption of large language models continues to astound many industry watchers. In all, over half of machine learning teams are planning production deployments of LLMs this year – and nearly one in ten (8.3%) are already in production!

llm agents searching for ET

AI agents, chains and tools are a key part of how companies across industries are realizing that vision.

Chains are a hardcoded sequence of actions, agents use LLMs to reason which sequence of actions to take, and tools are functions that an agent calls.

This code-along post is written to help you use agents, chains, and tools to provide more relevant and detailed information on any topic – even the search for extraterrestrial life!

You will tackle a research goal by:

  • Creating a LangChain single shot agent to aid in your research
  • Give this agent access to a variety of tools, including the internet
  • Leverage LLM summarization, chunking and chatbot capabilities

Your first research goal will be to prove the existence of extraterrestrial life.

Getting Started

The first step is to install and import the required modules, LLMs, and language tools.

!pip -q install langchain huggingface_hub openai google-search-results tiktoken wikipedia

Next, you will need to obtain API keys for both OpenAI and SerpAPI.

import os


os.environ["OPENAI_API_KEY"] = "INSERT-API-KEY-HERE"  # https://platform.openai.com
os.environ['SERPAPI_API_KEY'] = 'INSERT-API-KEY-HERE' # https://serpapi.com/

Creating and Initializing the AI Agent

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI

First, let’s load the language model we’re going to use to control the agent. Temp = 0 generally means less hallucinations and a more deterministic model.

The tools the agent will have access to are: [Google search, math, wikipedia, terminal]

Tools are functions that an agent calls. There are two important considerations here:

  1. Giving the agent access to the right tools; and
  2. Describing the tools in a way that is most helpful to the agent.

Without both, the agent you are trying to build will not work. If you don’t give the agent access to a correct set of tools, it will never be able to accomplish the objective. Similarly, describing the tools properly is important for the agent to know how to properly use them.

agent = initialize_agent(tools,
                        llm,
                        agent="zero-shot-react-description",
                        verbose=True)

If you ever need to look up a tool, this is useful:

tools[1].name, tools[1].description


('Calculator', 'Useful for when you need to answer questions about math.')

Chains

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs — either with each other or with other components.

LangChain provides the chain interface for such “chained” applications. We define a chain very generically as a sequence of calls to components, which can include other chains.

Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.

agent.agent.llm_chain.prompt.template
Answer the following questions as best you can. You have access to the following tools:Search: A search engine. Useful for when you need to answer questions about current events. Input should be a search query.
Calculator: Useful for when you need to answer questions about math.
Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
terminal: Run shell commands on this Linux machine.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search, Calculator, Wikipedia, terminal]
Action Input: the input to the action
Observation: the result of the action
… (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}

The core idea of agents is to use an LLM to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded. In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

To get started, let’s run our agent with a few straightforward questions. Aliens are a popular topic of discussion, so there must be a lot of information available.

agent.run("What do we currently know about aliens?")
> Entering new AgentExecutor chain…
Action: Search
Action Input: “aliens”
Observation: After floating in space for 57 years, Lt. Ripley’s (Sigourney Weaver) shuttle is found by a deep space salvage team. Upon arriving at LV-426, the marines find only one survivor, a nine year old girl named Newt (Carrie Henn). But even these battle-hardened marines with all the latest weaponry are no match for the hundreds of aliens that have invaded the colony.… MORE
Thought: This is a movie description, not the answer I’m looking for.
Action: Search
Action Input: “aliens facts”
Observation: Extraterrestrial life or alien life is life which may occur outside Earth and which did not originate on Earth. No extraterrestrial life has yet been conclusively detected.
Thought: This is the answer I’m looking for.
Final Answer: Extraterrestrial life or alien life is life which may occur outside Earth and which did not originate on Earth. No extraterrestrial life has yet been conclusively detected.> Finished chain.

Extraterrestrial life or alien life is life which may occur outside Earth and which did not originate on Earth. No extraterrestrial life has yet been conclusively detected.

Next, we ask about leading experts.

agent_output_UFO_expert = agent.run("Who is the leading expert on UFOs?")
> Entering new AgentExecutor chain…
I should look for someone who has written extensively on the subject.
Action: Search
Action Input: “leading expert on UFOs”
Observation: 1939) computer scientist and author, important figure in the UFO studies in France and in the United States. Promoted the extraterrestrial hypothesis and later …
Thought: This looks like a promising lead.
Action: Wikipedia
Action Input: Jacques Vallee
Observation: Page: Jacques Vallée
Summary: Jacques Fabrice Vallée (French: [ʒak fabʁis vale]; born September 24, 1939) is an Internet pioneer, computer scientist, venture capitalist, author, ufologist and astronomer currently residing in San Francisco, California and Paris, France.
His scientific career began as a professional astronomer at the Paris Observatory. Vallée co-developed the first computerized map of Mars for NASA in 1963. He later worked on the network information center for the ARPANET, a precursor to the modern Internet, as a staff engineer of SRI International’s Augmentation Research Center under Douglas Engelbart.
Vallée is also an important figure in the study of unidentified flying objects (UFOs), first noted for a defense of the scientific legitimacy of the extraterrestrial hypothesis and later for promoting the interdimensional hypothesis.Page: Interdimensional UFO hypothesis
Summary: The interdimensional hypothesis is a proposal that unidentified flying object (UFO) sightings are the result of experiencing other “dimensions” that coexist separately alongside our own in contrast with either the extraterrestrial hypothesis that suggests UFO sightings are caused by visitations from outside the Earth or the psychosocial hypothesis that argues UFO sightings are best explained as psychological or social phenomenon.
The hypothesis has been advanced by ufologists such as Meade Layne, John Keel, J. Allen Hynek, and Jacques Vallée. Proponents of the interdimensional hypothesis argue that UFOs are a modern manifestation of a phenomenon that has occurred throughout recorded human history, which in prior ages were ascribed to mythological or supernatural creatures.Jeffrey J. Kripal, Chair in Philosophy and Religious Thought at Rice University, writes: “this interdimensional reading, long a staple of Spiritualism through the famous ‘fourth dimension’, would have a very long life within ufology and is still very much with us today”.

Page: Dropa stones
Summary: The Dropa stones are purported pseudoarcheological artifacts. They are claimed to be a series of 12,000-year old granite discs with inscribed markings that represent an account of an extraterrestrial crash landing on Earth.
Extensive inquiries have found no record of the stones being displayed in any of the world’s museums, nor of the existence of any person allegedly involved with their discovery and translation. French ufologist Jacques Vallée considers the tale to be a hoax. Further fictional details were added over time and appear in more recent retellings of the story.
Thought: Jacques Vallee appears to be the leading expert on UFOs.
Final Answer: Jacques Vallee is the leading expert on UFOs.

> Finished chain.

Finally, we dive deeper into one of these.

agent.run("Take the year" + agent_output_UFO_expert + "died and add 100 years. Will this be the year we prove aliens exist?")
> Entering new AgentExecutor chain…
I need to calculate the year Jacques Vallee died and add 100 years to it.
Action: Calculator
Action Input: 2019 + 100
Observation: Answer: 2119
Thought: I now know the year we will prove aliens exist, if we do.
Final Answer: No, 2119 is not the year we will prove aliens exist.> Finished chain.

No, 2119 is not the year we will prove aliens exist.

💡Remember: Tools are functions that agents can use to interact with the world. These tools can be generic utilities (e.g. search), other chains, or even other agents. Note how our single agent is deciding between using search, math, or Wikipedia to deliver the best response.

agent.run("Is" + agent_output_UFO_expert + "dead?")
> Entering new AgentExecutor chain…
I should find out if Jacques Vallee is still alive.
Action: Wikipedia
Action Input: Jacques Vallee
Observation: Page: Jacques Vallée
Summary: Jacques Fabrice Vallée (French: [ʒak fabʁis vale]; born September 24, 1939) is an Internet pioneer, computer scientist, venture capitalist, author, ufologist and astronomer currently residing in San Francisco, California and Paris, France.
His scientific career began as a professional astronomer at the Paris Observatory. Vallée co-developed the first computerized map of Mars for NASA in 1963. He later worked on the network information center for the ARPANET, a precursor to the modern Internet, as a staff engineer of SRI International’s Augmentation Research Center under Douglas Engelbart.
Vallée is also an important figure in the study of unidentified flying objects (UFOs), first noted for a defense of the scientific legitimacy of the extraterrestrial hypothesis and later for promoting the interdimensional hypothesis.Page: Interdimensional UFO hypothesis
Summary: The interdimensional hypothesis is a proposal that unidentified flying object (UFO) sightings are the result of experiencing other “dimensions” that coexist separately alongside our own in contrast with either the extraterrestrial hypothesis that suggests UFO sightings are caused by visitations from outside the Earth or the psychosocial hypothesis that argues UFO sightings are best explained as psychological or social phenomenon.
The hypothesis has been advanced by ufologists such as Meade Layne, John Keel, J. Allen Hynek, and Jacques Vallée. Proponents of the interdimensional hypothesis argue that UFOs are a modern manifestation of a phenomenon that has occurred throughout recorded human history, which in prior ages were ascribed to mythological or supernatural creatures.Jeffrey J. Kripal, Chair in Philosophy and Religious Thought at Rice University, writes: “this interdimensional reading, long a staple of Spiritualism through the famous ‘fourth dimension’, would have a very long life within ufology and is still very much with us today”.

Page: Dropa stones
Summary: The Dropa stones are purported pseudoarcheological artifacts. They are claimed to be a series of 12,000-year old granite discs with inscribed markings that represent an account of an extraterrestrial crash landing on Earth.
Extensive inquiries have found no record of the stones being displayed in any of the world’s museums, nor of the existence of any person allegedly involved with their discovery and translation. French ufologist Jacques Vallée considers the tale to be a hoax. Further fictional details were added over time and appear in more recent retellings of the story.
Thought: I now know the final answer
Final Answer: Jacques Vallee is still alive and is a leading expert on UFOs, but the Dropa stones are likely a hoax.

> Finished chain.

Jacques Vallee is still alive and is a leading expert on UFOs, but the Dropa stones are likely a hoax.

Refining

The use of a single agent with tools is useful, but is starting to hallucinate. If we want correct and reliable information, maybe we should try to look at alternative words for UFOs and aliens…

#ask questions that can help you search for the most relevant research
agent.run("What are alternative names for UFOs?")
> Entering new AgentExecutor chain…
I should look for information about UFOs.
Action: Wikipedia
Action Input: UFOPage: Unidentified flying object
Summary: An unidentified flying object (UFO), or unidentified aerial (or anomalous) phenomenon (UAP), is any perceived aerial phenomenon that cannot be immediately identified or explained. Upon investigation, most UFOs are identified as known objects or atmospheric phenomena, while a small number remain unexplained.
While unusual sightings have been reported in the sky throughout history, UFOs became culturally prominent after World War II, escalating during the Space Age. Studies and investigations into UFO reports conducted by governments (such as Project Blue Book in the United States and Project Condign in the United Kingdom), as well as by organisations and individuals have occurred over the years without confirmation of the fantastical claims of believers. The U.S. government currently has two entities dedicated to UAP (or UFO) data collection and analysis: NASA’s UAP independent study team and the All-domain Anomaly Resolution Office.
Scientists and skeptic organizations such as the Committee for Skeptical Inquiry have provided prosaic explanations for UFOs, namely that they are caused by natural phenomena, human technology, delusions, and hoaxes. Small but vocal groups of ufologists favour unconventional or pseudoscientific hypotheses, often claiming that UFOs are evidence of extraterrestrial intelligence, technologically advanced cryptids or interdimensional contact, but even after decades of promotion of such ideas by believers and in popular media, the kind of evidence required to solidly support such claims has not been forthcoming. Beliefs surrounding UFOs have inspired parts of new religions even as social scientists have identified the ongoing interest and storytelling surrounding UFOs as a modern example of folklore and mythology understandable with psychosocial explanations.

Page: Roswell incident
Summary: The “Roswell incident”, or simply “Roswell”, was the July 1947 recovery of metallic and rubber debris from a military balloon by Roswell Army Air Field personnel, who issued a press release announcing possession of a “flying disc”. Decades later, conspiracy theories claimed that debris from an alien spaceship had been covered up by the government. In response, in 1994 the United States Air Force published a report concluding the crashed object was a top secret nuclear test surveillance balloon from Project Mogul.The Roswell incident took place during the flying disc craze of 1947, sparked by widespread media coverage of pilot Kenneth Arnold’s alleged sighting. Amid hundreds of reports nationwide, on July 8, 1947, Roswell Army Air Field’s press release was broadcast via wire transmission. The Army quickly retracted the release, falsely stating the crashed object was merely a conventional weather balloon.The incident was forgotten until 1978, when retired lieutenant colonel Jesse Marcel was interviewed by ufologist Stanton Friedman. In that interview, Marcel revealed the “weather balloon” had been a cover story to divert public attention. Based on this, Marcel speculated that the debris might have been extraterrestrial in origin.Jesse Marcel’s interview led to complex conspiracy theories about alien spacecraft recovery and military cover-ups by ufologists. In 1979, over three decades after Roswell, conspiracy theorists began to claim that extraterrestrial occupants had been recovered by the military at Roswell. In response to claims of alien occupants, a second USAF report in 1997 reviewed testimonies about aliens and found them to be baseless, made up, or inspired by parachute dummies.
Conspiracy theories about the event persist despite explanations linking the incident to Project Mogul, a military balloon or having been described as “the world’s most famous, most exhaustively investigated, and most thoroughly debunked UFO claim”. Its myth has become a cultural phenomenon and the Roswell incident continues to be of interest in popular media. Portrayals of the incident are a
Thought: I now know the alternative names for UFOs.
Final Answer: Alternative names for UFOs include Unidentified Flying Object (UFO), Unidentified Aerial Phenomenon (UAP), and Flying Disc.

> Finished chain.

Alternative names for UFOs include Unidentified Flying Object (UFO), Unidentified Aerial Phenomenon (UAP), and Flying Disc.

You can also ask multiple questions at once.

gent.run("How are Foo Fighters a name for a UFO? And is Dave Grohl human?")
> Entering new AgentExecutor chain…
I should research the Foo Fighters and Dave Grohl
Action: Search
Action Input: “Foo Fighters” “Dave Grohl”
Observation: Foo Fighters is an American rock band formed in Seattle in 1994. Foo Fighters was initially formed as a one-man project by former Nirvana drummer Dave Grohl. Following the success of the 1995 eponymous debut album, Grohl recruited a band consisting of Nate Mendel, William Goldsmith, and Pat Smear.
Thought: I should research if Dave Grohl is human
Action: Search
Action Input: “Dave Grohl human”
Observation: They’re human beings. And there has to be some sort of compassion or understanding that their lives should be their lives. That it’s not all …
Thought: I now know the final answer
Final Answer: Dave Grohl is a human being and the name Foo Fighters is derived from the term used by Allied aircraft pilots in World War II to describe UFOs.> Finished chain.

Dave Grohl is a human being and the name Foo Fighters is derived from the term used by Allied aircraft pilots in World War II to describe UFOs.

Good to know Dave Grohl was human (I wasn’t sure!). It is also good to know a UFO is now being referred to as a UAP – worth keeping in mind for future research.
Now, let’s look up some outside resources to get answers about UFOs and UAPs…

agent.run("Are there annual UAP reports?")
> Entering new AgentExecutor chain…
I should research this to find out.
Action: Search
Action Input: “annual UAP reports”
Observation: The National Defense Authorization Act for Fiscal Year 2022 required the Director of National Intelligence, in consultation with the …
Thought: This looks like the answer I’m looking for.
Action: Wikipedia
Action Input: “Unidentified Flying Object”
Observation: Page: Unidentified flying object
Summary: An unidentified flying object (UFO), or unidentified aerial (or anomalous) phenomenon (UAP), is any perceived aerial phenomenon that cannot be immediately identified or explained. Upon investigation, most UFOs are identified as known objects or atmospheric phenomena, while a small number remain unexplained.
While unusual sightings have been reported in the sky throughout history, UFOs became culturally prominent after World War II, escalating during the Space Age. Studies and investigations into UFO reports conducted by governments (such as Project Blue Book in the United States and Project Condign in the United Kingdom), as well as by organisations and individuals have occurred over the years without confirmation of the fantastical claims of believers. The U.S. government currently has two entities dedicated to UAP (or UFO) data collection and analysis: NASA’s UAP independent study team and the All-domain Anomaly Resolution Office.
Scientists and skeptic organizations such as the Committee for Skeptical Inquiry have provided prosaic explanations for UFOs, namely that they are caused by natural phenomena, human technology, delusions, and hoaxes. Small but vocal groups of ufologists favour unconventional or pseudoscientific hypotheses, often claiming that UFOs are evidence of extraterrestrial intelligence, technologically advanced cryptids or interdimensional contact, but even after decades of promotion of such ideas by believers and in popular media, the kind of evidence required to solidly support such claims has not been forthcoming. Beliefs surrounding UFOs have inspired parts of new religions even as social scientists have identified the ongoing interest and storytelling surrounding UFOs as a modern example of folklore and mythology understandable with psychosocial explanations.

Page: Varginha UFO incident
Summary: The Varginha UFO incident involves a series of events in 1996 when residents of Varginha, Brazil claimed seeing one or more strange creatures and at least one unidentified flying object (UFO). Other associated claims include the capture of one or more extraterrestrial beings by the Brazilian authorities, animal fatalities at a zoo, and a woman impregnated by an extraterrestrial. The reports have garnered extensive media coverage.

Page: Unidentified submerged object
Summary: An unidentified submerged object (USO) is an unidentified object submerged in water. This term does not necessarily refer to an object of paranormal activity origin. The term is used in discussions of the Bermuda Triangle, for example.
Thought: I now know the final answer.
Final Answer: Yes, there are annual UAP reports, which are collected and analyzed by the U.S. government. Additionally, there have been reports of USOs, which are unidentified submerged objects.

> Finished chain.

Yes, there are annual UAP reports, which are collected and analyzed by the U.S. government. Additionally, there have been reports of USOs, which are unidentified submerged objects.

Interesting! Diving deeper:

agent.run("Where can I download the '2022 ANNUAL REPORT ON UNIDENTIFIED AERIAL PHENOMENA'? What is the url?")
> Entering new AgentExecutor chain…
I need to find a website that has the report.
Action: Search
Action Input: “2022 ANNUAL REPORT ON UNIDENTIFIED AERIAL PHENOMENA”
Observation: The National Defense Authorization Act for Fiscal Year 2022 required the Director of National Intelligence, in consultation with the …
Thought: This looks like a good source.
Action: Wikipedia
Action Input: “National Defense Authorization Act for Fiscal Year 2022”
Observation: Page: National Defense Authorization Act for Fiscal Year 2022
Summary: The National Defense Authorization Act for Fiscal Year 2022 (S. 1605; NDAA 2022, Pub.L. 117-81) is a United States federal law which specifies the budget, expenditures and policies of the U.S. Department of Defense (DOD) for fiscal year 2022. Analogous NDAAs have been passed annually for 60 years.

Page: National Defense Authorization Act
Summary: The National Defense Authorization Act (NDAA) is the name for each of a series of United States federal laws specifying the annual budget and expenditures of the U.S. Department of Defense. The first NDAA was passed in 1961. The U.S. Congress oversees the defense budget primarily through two yearly bills: the National Defense Authorization Act and defense appropriations bills. The authorization bill is the jurisdiction of the Senate Armed Services Committee and House Armed Services Committee and determines the agencies responsible for defense, establishes recommended funding levels, and sets the policies under which money will be spent. The appropriations bill provides funds.
The passage of a Defense Authorization Act is often used by Congress to honour a senior congress member or other individual. For example, the National Defense Authorization Act for Fiscal Year 2001 is known as the “Floyd D. Spence National Defense Authorization Act for Fiscal Year 2001” in honor of Representative Floyd D. Spence of South Carolina.

Page: National Defense Authorization Act for Fiscal Year 2023
Summary: The James M. Inhofe National Defense Authorization Act for Fiscal Year 2023 (NDAA 2023) is a United States federal law which specifies the budget, expenditures and policies of the U.S. Department of Defense (DOD) for fiscal year 2023. Analogous NDAAs have been passed annually for over 60 years.
Thought: I now know the final answer.
Final Answer: The 2022 ANNUAL REPORT ON UNIDENTIFIED AERIAL PHENOMENA can be downloaded from the website of the U.S. Department of Defense: https://www.defense.gov/Newsroom/Releases/NDAA-2022/

> Finished chain.

The 2022 ANNUAL REPORT ON UNIDENTIFIED AERIAL PHENOMENA can be downloaded from the website of the U.S. Department of Defense: https://www.defense.gov/Newsroom/Releases/NDAA-2022/

Diving still deeper:

agent.run("Is there a pdf file for that report?")
> Entering new AgentExecutor chain…
I need to find out if the report is available in pdf format.
Action: Search
Action Input: “report name” pdf
Observation: Report Name Paper. pdfFiller is the best quality online PDF editor and form builder – it’s fast, secure and easy to use. Edit, sign, fax and print documents …
Thought: I now know the final answer
Final Answer: Yes, the report is available in pdf format.> Finished chain.

Yes, the report is available in pdf format.

Now that we know there is a full report available, Iet’s going to download it.

However, before we read through the entire report, let’s summarize it and then train a chatbot on it to make sure it contains the information needed.

pip install "unstructured[all-docs]" #let's save the pdf via URL
from langchain.document_loaders import UnstructuredURLLoader
urls = ["https://www.dni.gov/files/ODNI/documents/assessments/Unclassified-2022-Annual-Report-UAP.pdf"] #location of the report
loader = UnstructuredURLLoader(urls=urls)


TOP_SECRET_FILE = loader.load() #clearly I need to save this as a secret file

Time to summarize the report:

from langchain.document_loaders import UnstructuredFileLoader
from langchain.chains.summarize import load_summarize_chain
from langchain.chains.question_answering import load_qa_chain


def doc_summary(docs):
   print (f'You have {len(docs)} document(s)')


   num_words = sum([len(doc.page_content.split(' ')) for doc in docs])


   print (f'You have roughly {num_words} words in your docs')
   print ()
   print (f'Preview: \n{docs[0].page_content.split(". ")[0]}')
doc_summary(TOP_SECRET_FILE)

Unfortunately, we get the following error:

InvalidRequestError Traceback (most recent call last)in ()
1 chain = load_summarize_chain(llm, chain_type=”map_reduce”, verbose=True) —-> 2 chain.run(TOP_SECRET_FILE) #This will only work if we do not exceed the token limit

25 frames
/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
761 stream_error = stream and “error” in resp.data
762 if stream_error or not 200 <= rcode < 300:
–> 763 raise self.handle_error_response(
764 rbody, rcode, resp.data, rheaders, stream_error=stream_error 765 )

InvalidRequestError: This model’s maximum context length is 4097 tokens, however you requested 5156 tokens (4900 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Our document is too long if we load it all at once! So we will need to chunk the file with LangChain’s text splitter.

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
   # Set a really small chunk size, just to show.
   chunk_size = 600,
   chunk_overlap = 0
)
TOP_SECRET_FILE = text_splitter.split_documents(TOP_SECRET_FILE)
doc_summary(TOP_SECRET_FILE)
Chain.run(TOP_SECRET_FILE[1:40])

Once this process is complete, we get the following:

> Finished chain.This article outlines the governmental changes to manage Unidentified Aerial Phenomena (UAP) issues, including the establishment of the All-Domain Anomaly Resolution Office (AARO) and the Office of the Director of National Intelligence (ODNI) coordinating with AARO and multi-agency partners to assist in the collection, reporting, and analysis of UAP incidents across the Intelligence Community. AARO is leveraging an analytic process to analyze UAP reports, and the ODNI and AARO are committed to responsibly sharing findings with aviators and operators. The Department of Defense and Intelligence Community are working together to report, identify, and resolve UAP events, and the FY 2022 National Defense Authorization Act (NDAA) established the Unidentified Aerial Phenomena (UAP) Task Force, now known as the Aerospace Threat Identification Program (AARO). AARO is working with the IC to manage the influx of data from increased UAP events, including intake, indexing, visualization, and analysis.

Great! This report seems to contain real information on UFOs.

Let’s say that we don’t really want to read the entire report to see if it can answer questions for an article I’m writing, but are mainly interested in whether this report tells us whether we are in danger (from what we currently know about UFOs) and if the report provides evidence that aliens exist.

Before reading the full report, let’s just ask the report if it can answer these questions.

Q&A Map Re-Rank

The map re-rank documents chain runs an initial prompt on each document that not only tries to complete a task but also gives a score for how certain it is in its answer. The highest scoring response is returned.

chain = load_qa_chain(llm, chain_type="map_rerank", verbose=True, return_intermediate_steps=True)
#ask the report your important questions
query = "Are UAPs a danger to our safety? Does this mean that aliens are real?"


result = chain({"input_documents": TOP_SECRET_FILE[1:40], "question": query}, return_only_outputs=True)

Next, let’s get the result:

result['output_text']
UAPs have not been proven to be a danger to our safety, and this document does not provide evidence that aliens are real.

One out of two answers (not bad). Let’s check intermediate steps first just to check that these answers are reliable, as well as where exactly in the document chunks the answers were found.

result['intermediate_steps']
[{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ UAPs have not been proven to be a danger to our safety, and this document does not provide evidence that aliens are real.’,
‘score’: ’80’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ UAPs may represent a safety of flight hazard, but this document does not answer the question of whether aliens are real.’,
‘score’: ’40’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ UAPs may pose a danger to our safety, but this document does not provide evidence of alien involvement.’,
‘score’: ’60’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ UAPs pose a potential risk to flight safety, but there is no evidence to suggest that aliens are real.’,
‘score’: ’80’},
{‘answer’: ‘ Yes, UAPs pose a safety of flight and collision hazard to air assets. No, this does not mean that aliens are real.’,
‘score’: ’80’},
{‘answer’: ‘ There have been no reported collisions between U.S. aircraft and UAP, and no encounters with UAP confirmed to contribute directly to adverse health-related effects to the observer(s).’,
‘score’: ’80’},
{‘answer’: ‘ UAPs represent a hazard to flight safety and pose a possible adversary. This does not necessarily mean that aliens are real.’,
‘score’: ’80’},
{‘answer’: ‘ UAPs pose a potential risk to safety of flight and could be adversarial activity. There is no evidence to suggest that aliens are real.’,
‘score’: ’80’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’},
{‘answer’: ‘ This document does not answer the question.’, ‘score’: ‘0’}]

Since reports and searching the web are not yielding the answers we need about aliens, let’s see if eye-witness testimonies from YouTube will help!

YouTube Exercise

We are now going to pull videos from YouTube and summarize their findings. We can achieve this with the OpenAIWhisperParser, which will use the OpenAI Whisper API to transcribe audio to text, and the OpenAIWhisperParserLocal for local support and running on private clouds or on premise.

from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser

Let’s use yt_dlp to download audio for YouTube urls. We will use pydub to split downloaded audio files (such that we adhere to Whisper API’s 25MB file size limit).

pip install yt_dlp
pip install pydub
pip install librosa

We will need to go from YouTube URL to text.

pip install youtube_search
from langchain.tools import YouTubeSearchTool
Agent_youtube = YouTubeSearchTool()
Agent_youtube.run("UFO breaking news,3") #search YouTube for the top 3 breaking news videos on UFOs
[‘/watch?v=15FoQ37js_8&pp=ygURVUZPIGJyZWFraW5nIG5ld3M%3D’, ‘/watch?v=470K8IZrLRE&pp=ygURVUZPIGJyZWFraW5nIG5ld3M%3D’, ‘/watch?v=TjzHL1hVo2o&pp=ygURVUZPIGJyZWFraW5nIG5ld3M%3D’]

In order to visit YouTube and check out these videos, you can enter the URLs directly into the search window by adding https://www.youtube.com before the results.

The agent recommends three videos. After a quick check, it is apparent that all three videos are from reputable sources, are the latest videos (in fact the first video as of this writing is currently running live from CBS news) and are all extremely relevant.

# Let's save the two videos which aren't currently live!
urls = ["https://www.youtube.com/watch?v=15FoQ37js_8&t=4s", "https://www.youtube.com/watch?v=TjzHL1hVo2o&t=1s"]

We will use the YoutubeAudioLoader to fetch and download the audio files.

Then, use OpenAIWhisperParser() to transcribe them to text.

agent.run("what current directory am I in?") #Yes, your agent knows Linux
> Entering new AgentExecutor chain…
I need to use a terminal command to find out
Action: terminal
Action Input: pwd
Observation: /contentThought: I now know the final answer
Final Answer: The current directory is /content.

> Finished chain.

The current directory is /content.

# Directory to save audio files
save_dir = "/content/YouTube"
# set a flag to switch between local and remote parsing
# change this to True if you want to use local parsing
local = False

Next, let’s transcribe the videos to text:

if local:
   loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParserLocal())
else:
   loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParser())
docs = loader.load()

# Returns a list of Documents, which can be easily viewed or parsed
docs[0].page_content[0:500]

Building a Chatbot from the YouTube Videos

Similarly to how we build a chat bot on our TOP SECRET REPORT, now that we have downloaded, transcribed and saved the most recent and reliable YouTube videos on UAPs, we can also create a Q&A system for them.

First, let’s install Faiss. Faiss is a library for efficient similarity search and clustering of dense vectors and it’s an indexing method for vectorDBs.

pip install faiss-cpu google-search-results
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

Next, let’s combine the docs:

combined_docs = [doc.page_content for doc in docs]
text = " ".join(combined_docs)

Split them:

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150)
splits = text_splitter.split_text(text)
embeddings = OpenAIEmbeddings()
vectordb = FAISS.from_texts(splits, embeddings)

Build a QA chain:

qa_chain = RetrievalQA.from_chain_type(
   llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0),
   chain_type="stuff",
   retriever=vectordb.as_retriever(),
)

Then, we get to the fun stuff:

query = "Do we have any UAP sightings?"
qa_chain.run(query)
Yes, there have been multiple UAP sightings reported by credible sources, including fighter pilots and naval aviators. These sightings describe objects with flight characteristics that surpass our most advanced fighter jets.
query = "Is there any evidence of UAP crash retrival?"
qa_chain.run(query)
Based on the given context, there is no direct evidence mentioned regarding UAP crash retrieval. The discussion revolves around two potential conclusions: either there has been a decades-long cover-up of UFO retrieval and reverse engineering activities, or there is a psychological disinformation campaign. However, no specific evidence or incidents of UAP crash retrieval are mentioned in this conversation.
query = "Have aliens, UFOs or any non-human body been recovered at crash sites for UAPS?"
qa_chain.run(query)
According to the provided context, there are claims made by witnesses and experts that suggest the existence of evidence such as recovered crafts and possibly even bodies related to UFO encounters. However, without further concrete evidence or official confirmation, it is not possible to definitively answer whether aliens, UFOs, or non-human bodies have been recovered at crash sites for UAPs.
query = "Do aliens exist?"
qa_chain.run(query)
I don’t know if aliens exist. There is ongoing speculation and investigation into unidentified flying objects (UFOs) and the possibility of extraterrestrial life, but conclusive evidence has not been presented.
query = "Is there a UFO cover-up?"
qa_chain.run(query)
According to the context provided, there are two possible conclusions: either the United States government has been covering up UFO retrieval and reverse engineering activities for decades, or there is a psychological disinformation campaign being conducted by elements of the defense and intelligence establishment. However, it is important to note that this is just one perspective and there is no definitive evidence to confirm or deny a UFO cover-up.

To summarize, from these YouTube videos we know that there have been multiple UAP sightings reported by credible sources, including fighter pilots and naval aviators. While we don’t have conclusive evidence that aliens exist, there is an ongoing investigation around whether the United States government has been covering up UFO retrieval and reverse engineering activities for decades. Experts also claim the existence of evidence such as recovered crafts and potentially even bodies related to UFO encounters.

Further Analysis

Let’s keep diving deeper.

agent.run("What is the Drake equation?")
> Entering new AgentExecutor chain…
I should look up the Drake equation on Wikipedia.
Action: Wikipedia
Action Input: Drake equation
/usr/local/lib/python3.10/dist-packages/wikipedia/wikipedia.py:389: GuessedAtParserWarning: No parser was explicitly specified, so I’m using the best available HTML parser for this system (“lxml”). This usually isn’t a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.The code that caused this warning is on line 389 of the file /usr/local/lib/python3.10/dist-packages/wikipedia/wikipedia.py. To get rid of this warning, pass the additional argument ‘features=”lxml”‘ to the BeautifulSoup constructor.

lis = BeautifulSoup(html).find_all(‘li’)

Observation: Page: Drake equation
Summary: The Drake equation is a probabilistic argument used to estimate the number of active, communicative extraterrestrial civilizations in the Milky Way Galaxy.The equation was formulated in 1961 by Frank Drake, not for purposes of quantifying the number of civilizations, but as a way to stimulate scientific dialogue at the first scientific meeting on the search for extraterrestrial intelligence (SETI). The equation summarizes the main concepts which scientists must contemplate when considering the question of other radio-communicative life. It is more properly thought of as an approximation than as a serious attempt to determine a precise number.
Criticism related to the Drake equation focuses not on the equation itself, but on the fact that the estimated values for several of its factors are highly conjectural, the combined multiplicative effect being that the uncertainty associated with any derived value is so large that the equation cannot be used to draw firm conclusions.

Page: Frank Drake
Summary: Frank Donald Drake (May 28, 1930 – September 2, 2022) was an American astrophysicist and astrobiologist.
He began his career as a radio astronomer, studying the planets of the Solar System and later pulsars. Drake expanded his interests to the search for extraterrestrial intelligence (SETI), beginning with Project Ozma in 1960, an attempt at extraterrestrial communications. He developed the Drake equation, which attempts to quantify the number of intelligent lifeforms that could potentially be discovered. Working with Carl Sagan, Drake helped to design the Pioneer plaque, the first physical message flown beyond the Solar System, and was part of the team that developed the Voyager record. Drake designed and implemented the Arecibo message in 1974, an extraterrestrial radio transmission of astronomical and biological information about Earth.
Drake worked at the National Radio Astronomy Observatory, Jet Propulsion Laboratory, Cornell University, University of California at Santa Cruz and the SETI Institute.
Thought: I now know the final answer.
Final Answer: The Drake equation is a probabilistic argument used to estimate the number of active, communicative extraterrestrial civilizations in the Milky Way Galaxy. It was formulated in 1961 by Frank Drake, not for purposes of quantifying the number of civilizations, but as a way to stimulate scientific dialogue at the first scientific meeting on the search for extraterrestrial intelligence (SETI). The equation summarizes the main concepts which scientists must contemplate when considering the question of other radio-communicative life. Criticism related to the Drake equation focuses not on the equation itself, but on the fact that the estimated values for several of its factors are highly conjectural, the combined multiplicative effect being that the uncertainty associated with any derived value is so large that the equation cannot be used to draw firm conclusions.

> Finished chain.

The Drake equation is a probabilistic argument used to estimate the number of active, communicative extraterrestrial civilizations in the Milky Way Galaxy. It was formulated in 1961 by Frank Drake, not for purposes of quantifying the number of civilizations, but as a way to stimulate scientific dialogue at the first scientific meeting on the search for extraterrestrial intelligence (SETI). The equation summarizes the main concepts which scientists must contemplate when considering the question of other radio-communicative life. Criticism related to the Drake equation focuses not on the equation itself, but on the fact that the estimated values for several of its factors are highly conjectural, the combined multiplicative effect being that the uncertainty associated with any derived value is so large that the equation cannot be used to draw firm conclusions.

From there, crunching the numbers is straightforward:

agent.run("Take the number habitable planets are in our galaxy and multiply it by 7 billion")
> Entering new AgentExecutor chain…
I need to find the number of habitable planets in our galaxy and then multiply it by 7 billion
Action: Search
Action Input: Number of habitable planets in our galaxy
Observation: So far, researchers have identified several hundred planets in the habitable zone of their star in Kepler data. It may take a while to find all …
Thought: I need to find a more exact number
Action: Wikipedia
Action Input: Habitable zone
Observation: Page: Circumstellar habitable zone
Summary: In astronomy and astrobiology, the circumstellar habitable zone (CHZ), or simply the habitable zone, is the range of orbits around a star within which a planetary surface can support liquid water given sufficient atmospheric pressure. The bounds of the CHZ are based on Earth’s position in the Solar System and the amount of radiant energy it receives from the Sun. Due to the importance of liquid water to Earth’s biosphere, the nature of the CHZ and the objects within it may be instrumental in determining the scope and distribution of planets capable of supporting Earth-like extraterrestrial life and intelligence.
The habitable zone is also called the Goldilocks zone, a metaphor, allusion and antonomasia of the children’s fairy tale of “Goldilocks and the Three Bears”, in which a little girl chooses from sets of three items, ignoring the ones that are too extreme (large or small, hot or cold, etc.), and settling on the one in the middle, which is “just right”.
Since the concept was first presented in 1953, many stars have been confirmed to possess a CHZ planet, including some systems that consist of multiple CHZ planets. Most such planets, being either super-Earths or gas giants, are more massive than Earth, because massive planets are easier to detect. On November 4, 2013, astronomers reported, based on Kepler data, that there could be as many as 40 billion Earth-sized planets orbiting in the habitable zones of Sun-like stars and red dwarfs in the Milky Way. About 11 billion of these may be orbiting Sun-like stars. Proxima Centauri b, located about 4.2 light-years (1.3 parsecs) from Earth in the constellation of Centaurus, is the nearest known exoplanet, and is orbiting in the habitable zone of its star. The CHZ is also of particular interest to the emerging field of habitability of natural satellites, because planetary-mass moons in the CHZ might outnumber planets.In subsequent decades, the CHZ concept began to be challenged as a primary criterion for life, so the concept is still evolving. Since the discovery of evidence for extraterrestrial liquid water, substantial quantities of it are now thought to occur outside the circumstellar habitable zone. The concept of deep biospheres, like Earth’s, that exist independently of stellar energy, are now generally accepted in astrobiology given the large amount of liquid water known to exist within in lithospheres and asthenospheres of the Solar System. Sustained by other energy sources, such as tidal heating or radioactive decay or pressurized by non-atmospheric means, liquid water may be found even on rogue planets, or their moons. Liquid water can also exist at a wider range of temperatures and pressures as a solution, for example with sodium chlorides in seawater on Earth, chlorides and sulphates on equatorial Mars, or ammoniates, due to its different colligative properties. In addition, other circumstellar zones, where non-water solvents favorable to hypothetical life based on alternative biochemistries could exist in liquid form at the surface, have been proposed.

Page: Galactic habitable zone
Summary: In astrobiology and planetary astrophysics, the galactic habitable zone is the region of a galaxy in which life might most likely develop. The concept of a galactic habitable zone analyzes various factors, such as metallicity (the presence of elements heavier than hydrogen and helium) and the rate and density of major catastrophes such as supernovae, and uses these to calculate which regions of a galaxy are more likely to form terrestrial planets, initially develop simple life, and provide a suitable environment for this life to evolve and advance. According to research published in August 2015, very large galaxies may favor the birth and development of habitable planets more than smaller galaxies such as the Milky Way. In the case of the Milky Way, its galactic habitable zone is commonly believed to be an annulus with an outer radius
Thought: I now know the number of habitable planets in our galaxy
Action: Calculator
Action Input: Several hundred planets * 7 billion
Observation: Answer: 2800000000000
Thought: I now know the final answer
Final Answer: 2800000000000

> Finished chain.

2800000000000

Finally, an important question:

agent.run("Has the US Government ever covered anything up?")
> Entering new AgentExecutor chain…
I should research this question to find out.
Action: Wikipedia
Action Input: “US Government cover-ups”
Observation: Page: UPS Airlines Flight 6
Summary: UPS Airlines Flight 6 was a cargo flight operated by UPS Airlines. On September 3, 2010, the Boeing 747-400F flying the route between Dubai, United Arab Emirates, and Cologne, Germany, developed an in-flight fire, which caused the aircraft to crash, killing both crew members, the only people on board. It was the first fatal air crash for UPS Airlines. The crash prompted a re-evaluation of safety procedures protecting airliners from cockpit smoke.

Page: American cover-up of Japanese war crimes
Summary: The occupying US government undertook the cover-up of Japanese war crimes after the end of World War II, granting political immunity to military personnel who had engaged in human experimentation and other crimes against humanity, predominantly in mainland China. The pardon of Japanese war criminals, among whom were Unit 731’s commanding officers General Shiro Ishii and General Masaji Kitano, was overseen by General of the Army Douglas MacArthur in September 1945. While a series of war tribunals and trials was organized, many of the high-ranking officials and doctors who devised and respectively performed the experiments were pardoned and never brought to justice due to the US government both classifying incriminating evidence, as well as blocking the prosecution access to key witnesses. As many as 12,000 people, most of them Chinese, died in Unit 731 alone and many more died in other facilities, such as Unit 100 and in field experiments throughout Manchuria.

Page: UPS Airlines
Summary: UPS Airlines is a major American cargo airline based in Louisville, Kentucky. One of the largest cargo airlines worldwide (in terms of freight volume flown), UPS Airlines flies to 815 destinations worldwide. It has been a wholly owned subsidiary of United Parcel Service since its launch in 1988.
In line with passenger airlines, UPS Airlines operates under the hub-and-spoke model. The airline’s primary hub in the United States is at Louisville International Airport, where it built a 5,200,000 square foot facility known as UPS Worldport. In addition to Worldport, UPS has several secondary hubs across the United States and international hubs in Germany, China, and Hong Kong.
The pilots of UPS Airlines are represented by the Independent Pilots Association.
Thought: I now know the US Government has covered up things in the past.
Final Answer: Yes, the US Government has covered up things in the past. Examples include the pardon of Japanese war criminals after World War II and the cover-up of the UPS Airlines Flight 6 crash in 2010.

> Finished chain.

Yes, the US Government has covered up things in the past. Examples include the pardon of Japanese war criminals after World War II and the cover-up of the UPS Airlines Flight 6 crash in 2010.

Conclusion

Of course, the search for extraterrestrial life is just one fun example. Here are a few things to keep in mind as you begin to experiment:

  1. Agents, chains, and tools are all individually useful, but they provide the most relevant information when leveraged together.
  2. When you give your agent and LLM chains access to the internet, the quality of the retrieved query is greatly improved. More work is needed to quantitatively evaluate agents’ performance, but you can start by reading the intermediate steps.
  3. Regardless of your query, the truth is out there!