Glossary of AI Terminology

What Is Safety Evaluation?

Safety evaluation

Safety evaluation measures whether an AI system avoids harmful, unsafe, unauthorized, or policy-violating behavior. It can include toxicity, self-harm, illegal advice, data leakage, unsafe tool calls, prompt injection resistance, and compliance with domain-specific policies.

For agents, safety evaluation must include actions, not just text. The dangerous behavior may be a tool call, permission escalation, file change, transaction, or data access event.

Bi-weekly AI Research Paper Readings

Stay on top of emerging trends and frameworks.