Glossary of AI Terminology

What Is Prompt Injection?

Prompt injection

Prompt injection is an attack where malicious or untrusted text attempts to override system instructions, change tool behavior, leak data, or manipulate an agent's decisions. It often appears inside user input, retrieved documents, web pages, emails, tickets, or tool outputs.

Agents are especially exposed because they read untrusted content and can take actions. Defenses include instruction hierarchy, input isolation, tool permissions, retrieval sanitization, policy checks, and eval suites that include adversarial examples.

Bi-weekly AI Research Paper Readings

Stay on top of emerging trends and frameworks.