What is LLM Jailbreaking?

LLM Jailbreaking

LLM jailbreaking refers to escaping the guardrails and safeguards of an LLM application or foundation model. These methods exploit vulnerabilities in the model's design or prompt engineering to elicit responses that the model would normally be restricted from generating. Jailbreaking sometimes leads to the dissemination of harmful or unintended content.

Example

Prominent LLM jailbreaks have led to a car getting sold for $1 and disturbing replies in healthcare.

llm guards cover art

Sign up for our monthly newsletter, The Evaluator.

Sign up now