What is LLM Jailbreaking?
LLM Jailbreaking
LLM jailbreaking refers to escaping the guardrails and safeguards of an LLM application or foundation model. These methods exploit vulnerabilities in the model's design or prompt engineering to elicit responses that the model would normally be restricted from generating. Jailbreaking sometimes leads to the dissemination of harmful or unintended content.
Example
Prominent LLM jailbreaks have led to a car getting sold for $1 and disturbing replies in healthcare.