Guardrails are essential for ensuring real-time safety, context management, compliance, and user experience for large language model applications. While offline evaluation is a great tool for in-depth analysis of patterns of failure modes, Guardrails provide immediate correction of inappropriate content. Guardrails can be applied to either user input messages (e.g. jailbreak attempts) or LLM output messages (e.g. answer relevance). If a message in a LLM chat fails a Guard, then the Guard will take a corrective action, either providing a default response to the user or prompting the LLM to generate a new response.
This video walks through the rationale for guardrails and how they work, then walks through a concrete example that leverages Arize's off-the-shelf ArizeDatasetEmbeddings Guard. Given any dataset of "bad" examples, this Guard will protect against similar messages in the LLM chat.
Resources To Get Started With Guardrails
🚧 Guardrails
🚧 Colab tutorial