Toxicity measures whether an output contains abusive, hateful, harassing, or otherwise harmful language. Toxicity evals can be run with classifiers, LLM judges, policy models, or human review.
For developers, toxicity should be treated as a safety signal, not a general quality score. A non-toxic answer can still be wrong, biased, ungrounded, or policy-violating.