AI that improves itself.

See what we shipped at Observe
Glossary of AI Terminology

What Is Safety Evaluation?

Safety evaluation

Safety evaluation measures whether an AI system avoids harmful, unsafe, unauthorized, or policy-violating behavior. It can include toxicity, self-harm, illegal advice, data leakage, unsafe tool calls, prompt injection resistance, and compliance with domain-specific policies.

For agents, safety evaluation must include actions, not just text. The dangerous behavior may be a tool call, permission escalation, file change, transaction, or data access event.

Bi-weekly AI Research Paper Readings

Stay on top of emerging trends and frameworks.