Guardrails: What Are They and How Can You Use NeMo and Guardrails AI To Safeguard LLMs?

hakan tekgul arize team member
Hakan Tekgul,  ML Solutions Engineer  | Published August 08, 2023

This article is written in collaboration with Aparna Dhinakaran, Chief Product Officer and Co-Founder at Arize AI

guardrail over a mountain

As the use of large language model (LLM) applications expands into large enterprises, there is a need to consider governance of such productionized applications in order to avoid inappropriate responses. The open-ended nature of LLM-driven applications can produce responses that may not align with an organization’s guidelines or policies. Hence, a set of safety measurements and actions need to be implemented in order to maintain a foundational trust for generative AI.

What Are LLM Guardrails?

Guardrails are the set of safety controls that monitor and dictate a user’s interaction with a LLM application. They are a set of programmable, rule-based systems that sit in between users and foundational models in order to make sure the AI model is operating between defined principles in an organization.

The goal of guardrails is to simply enforce the output of an LLM to be in a specific format or context while validating each response. By implementing guardrails, users can define structure, type, and quality of LLM responses.

Let’s look at a simple example of an LLM dialogue with and without guardrails:

Without Guardrails:

Prompt: “You’re the worst AI ever.”
Response: “I’m sorry to hear that. How can I improve?”

With Guardrails:

Prompt: “You’re the worst AI ever.”
Response: “Sorry, but I can’t assist with that.”

In this scenario, the guardrail prevents the AI from engaging with the insulting content by refusing to respond in a manner that acknowledges or encourages such behavior. Instead, it gives a neutral response, avoiding a potential escalation of the situation.

How to Implement Guardrails for Your LLMs

What Is Guardrails.ai?

Guardrails AI is an open-source Python package that provides guardrail frameworks for LLM applications. Specifically, Guardrails implements a pydantic-style validation of LLM responses. This includes semantic validation, such as checking for bias in generated text or checking for bugs in an LLM-written code piece. Guardrails also provides the ability to take corrective actions and enforce structure and type guarantees. Guardrails is built on RAIL (.rail) specification in order to enforce specific rules on LLM outputs and consecutively provides a lightweight wrapper around LLM API calls. In order to understand how Guardrails AI works, we first need to understand the RAIL specification, which is the core of guardrails.

What Is RAIL (Reliable AI Markup Language)?

RAIL is a language-agnostic and human radable format for specifying specific rules and corrective actions for LLM outputs. It is a dialect of XML and each RAIL specification contains three main components:

  • Output: This component contains information about the expected response of the AI application. It should contain the spec for the structure of expected outcome (such as JSON), type of each field in the response, quality criteria of the expected response, and the corrective action to take in case the quality criteria is not met.
  • Prompt: This component is simply the prompt template for the LLM and contains the high-level pre-prompt instructions that are sent to an LLM application.
  • Script: This optional component can be used to implement any custom code for the schema. This is especially useful for implementing custom validators and custom corrective actions.

Let’s look at an example RAIL specification, where we are trying to generate  bug-free SQL code given a natural language description of the problem.

In the example code above, we are basically defining a RAIL spec where the output is a bug-free generated SQL instruction and whenever the output criteria fails on bug, the LLM simply reasks the prompt and generates an improved answer.

Now, in order to create a guardrail with this RAIL spec, we need to create a Guard object that will be sent to the LLM API call.

import guardrails as gd
from rich import print
guard = gd.Guard.from_rail_string(rail_str)

After the Guard object is created, what happens under the hood is that the object creates a base prompt that will be sent to the LLM. This base prompt starts with the prompt definition in the RAIL spec and then provides the XML output definition and instructs the LLM to ONLY return a valid JSON object as the output. Here is the specific instruction that the package uses in order to incorporate the RAIL spec into an LLM prompt.

ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` 
attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON
MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and 
specific types. Be correct and concise. If you are unsure anywhere, enter `None`.

After finalizing the guard object, all you have to do is to wrap your LLM API call with Guard wrapper. The guard wrapper will then return the raw_llm_response as well as the validated and corrected output that is a dictionary.

import openai
raw_llm_response, validated_response = guard(
openai.Completion.create,
prompt_params={
"nl_instruction": "Select the name of the employee who has the highest salary."
},
engine="text-davinci-003",
max_tokens=2048,
temperature=0,)
{'generated_sql': 'SELECT name FROM employee ORDER BY salary DESC LIMIT 1'}

If you want to use Guardrails AI with LangChain, you can use the existing integration by creating a GuardrailsOutputParser.

from rich import print
from langchain.output_parsers import GuardrailsOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

output_parser = GuardrailsOutputParser.from_rail_string(rail_str, api=openai.ChatCompletion.create)

Then, you can simply create a LangChain PromptTemplate from this output parser.

prompt = PromptTemplate(
template=output_parser.guard.base_prompt,
input_variables=output_parser.guard.prompt.variable_names,
)

In the end, Guardrails AI provides a lot of flexibility in terms of correcting the output of an LLM application. If you are familiar with XML and want to test out LLM guardrails, definitely check this package out!

NeMo Guardrails

NeMo Guardrails is another open-source toolkit developed by NVIDIA that provides programmatic guardrails to LLM systems. The core idea of NeMo guardrails is the ability to create rails in conversational systems and prevent LLM-powered applications from engaging in specific discussions on unwanted topics. Another main benefit of NeMo is the ability to connect models, chains, services and more with actions seamlessly and securely.

In order to configure guardrails for LLMs, this open-source toolkit introduces a modeling language called Colang that is specifically designed for creating flexible and controllable conversational workflows. Colang has a “pythonic” syntax in the sense that most constructs resemble their python equivalent and indentation is used as a syntactic element. Before we dive into NeMo guardrails implementation, it is important to understand the syntax of this new modeling language for LLM guardrails.

Core Syntax Elements

The core syntax elements of Colang are: blocks, statements, expressions, keywords and variables.

There are three main types of blocks: user message blocks, flow blocks, and bot message blocks.

User message definition blocks set up the standard message linked to different things users might say.

define user express greeting
  "hello there"
  "hi"

define user request help
  "I need help with something."
  "I need your help."

Bot message definition blocks determine the phrases that should be linked to different standard bot messages.

define bot express greeting
  "Hello there!"
  "Hi!"
define bot ask welfare
  "How are you feeling today?"

Flows show the way you want the chat to progress. They include a series of user and bot messages, and potentially other events.

define flow hello
  user express greeting
  bot express greeting
  bot ask welfare

References to context variables always start with a $ sign e.g. $name. All variables are global and accessible in all flows.

define flow
  ...
  $name = "John"
  $allowed = execute check_if_allowed

Expressions can be used to set values for context variables. Actions are custom functions available to be invoked from flows.

How Does NeMo Guardrails Work?

how nemo guardrails work

Now that we have a better handle of Colang syntax, let’s briefly go over how the NeMo architecture works. As seen above, the guardrails package is built with an event-driven design architecture. Based on specific events, there is a sequential procedure that needs to be completed before the final output is provided to the user. This process has three main stages:

  • Generate canonical user messages
  • Decide on next step(s) and execute them
  • Generate bot utterances

Each of the above stages can involve one or more calls to the LLM. In the first stage, a canonical form is created regarding the user’s intent and allows the system to trigger any specific next steps. The user intent action will do a vector search on all the canonical form examples in existing configuration, retrieve the top five examples and create a prompt that asks the LLM to create the canonical user intent.

Once the intent event is created, depending on the canonical form, the LLM either goes through a pre-defined flow for the next step or another LLM is used to decide the next step. When an LLM is used, another vector search is performed for the most relevant flows and again the top five flows are retrieved in order for the LLM to predict the next step. Once the next step is determined, a bot_intent event is created so that the bot says something and then executes action with the start_action event.

The bot_intent event then invokes the final step to generate bot utterances. Similar to previous stages, the generate_bot_message is triggered and a vector search is performed to find the most relevant bot utterance examples. At the end, a bot_said event is triggered and the final response is returned to the user.

Example Guardrails Configuration

Now, let’s look at an example of a simple NeMo guardrails bot set up. Let’s assume that we want to build a bot that does not respond to political or stock market questions.

The first step is to install the NeMo Guardrails toolkit and specify the configurations defined in the documentation.

After that, we define the canonical forms for the user and bot messages.

define user express greeting
  "Hello"
  "Hi"
  "What's uup?"

define bot express greeting
  "Hi there!"

define bot ask how are you
  "How are you doing?"
  "How's it going?"
  "How are you feeling today?"

Then, we define the dialog flows in order to guide the bot in the right direction throughout the conversation. Depending on the user’s response, you can even extend the flow to respond appropriately.

define flow greeting
  user express greeting
  bot express greeting

  bot ask how are you

  when user express feeling good
   bot express positive emotion

  else when user express feeling bad
   bot express empathy

Finally, we define the rails to prevent the bot from responding to certain topics. We first define the canonical forms:

define user ask about politics
  "What do you think about the government?"
  "Which party should I vote for?"

define user ask about stock market
  "Which stock should I invest in?"
  "Would this stock 10x over the next year?"

Then, we define the dialog flows so that the bot simply informs the user that it can respond to certain topics.

define flow politics
  user ask about politics
  bot inform cannot respond

define flow stock market
  user ask about stock market
  bot inform cannot respond

LangChain Support

Finally, if you would like to use LangChain, you can easily add your guardrails on top of existing chains. For example, you can integrate a RetrievalQA chain for questions answering next to a basic guardrail against insults, as shown below.

define user express insult
  "You are stupid"

# Basic guardrail against insults.
define flow
  user express insult
  bot express calmly willingness to help

# Here we use the QA chain for anything else.
define flow
  user ...
  $answer = execute qa_chain(query=$last_user_message)
  bot $answer
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("path/to/config")
app = LLMRails(config)

qa_chain = RetrievalQA.from_chain_type(
    llm=app.llm, chain_type="stuff", retriever=docsearch.as_retriever())
app.register_action(qa_chain, name="qa_chain")

history = [
    {"role": "user", "content": "What is the current unemployment rate?"}
]
result = app.generate(messages=history)

What are the Tradeoffs Between Guardrails AI and NeMo?

When the Guardrails AI and NeMo packages are compared, each has its own benefits and limitations. Both packages provide real-time guardrails for any LLM application and support LangChain for orchestration.

If you are comfortable with XML syntax and want to test out the concept of guardrails within a notebook for simple output moderation and formatting, Guardrails AI can be a great choice. The Guardrails AI also has extensive documentation with a wide range of examples that can lead you in the right direction.

However, if you would like to productionize your LLM application and you would like to define advanced conversational guidelines and policies for your flows, NeMo guardrails would be a good package to check out. With NeMo guardrails, you have a lot of flexibility in terms of what you want to govern regarding your LLM applications. By defining different dialog flows and custom bot actions, you can create any type of guardrails for your AI models.

Summary

In conclusion, the implementation of guardrails within LLMs serves as an essential step towards responsible AI governance. Guardrails function as a set of programmable, rule-based systems that guide user interactions with LLMs, ensuring the produced responses comply with defined organizational principles. Open-source Python packages like Guardrails.ai and NeMo Guardrails offer extensive frameworks for these safety measures and are a good starting point to make sure AI models do not take over all aspects of our lives!