Overview
The Conciseness evaluator assesses whether an LLM’s response uses the minimum number of words necessary to fully answer the question. It detects unnecessary pleasantries, hedging language, meta-commentary, redundant restatements, and unsolicited explanations.When to Use
Use the Conciseness evaluator when you need to:- Detect filler language - Identify unnecessary pleasantries like “Great question!” or “I’d be happy to help”
- Flag hedging and qualifiers - Catch excessive hedging like “It’s worth noting that…”
- Identify meta-commentary - Detect self-referential statements about the model’s capabilities
- Find redundant content - Spot restatements and unnecessary repetition
- Enforce brevity - Ensure responses are direct and to the point
Conciseness evaluates only whether the response uses more words than necessary. It does not assess correctness, helpfulness, or quality of information. Use the Correctness evaluator for factual accuracy.
Supported Levels
The level of an evaluator determines the scope of the evaluation in OpenTelemetry terms. Some evaluations are applicable to individual spans, some to full traces or sessions, and some are applicable at multiple levels.| Level | Supported | Notes |
|---|---|---|
| Span | Yes | Apply to LLM spans where you want to evaluate response brevity. |
Input Requirements
The Conciseness evaluator requires two inputs:| Field | Type | Description |
|---|---|---|
input | string | The user’s query or question |
output | string | The LLM’s response to evaluate |
Formatting Tips
For best results:- Use human-readable strings rather than raw JSON for all inputs
- For multi-turn conversations, format input as a readable conversation:
Output Interpretation
The evaluator returns aScore object with the following properties:
| Property | Value | Description |
|---|---|---|
label | "concise" or "verbose" | Classification result |
score | 1.0 or 0.0 | Numeric score (1.0 = concise, 0.0 = verbose) |
explanation | string | LLM-generated reasoning for the classification |
direction | "maximize" | Higher scores are better |
metadata | object | Additional information such as the model name. When tracing is enabled, includes the trace_id for the evaluation. |
- Concise (1.0): The response contains only the information necessary to answer the question
- Verbose (0.0): The response contains unnecessary filler, hedging, meta-commentary, or redundant content
Usage Examples
- Python
- TypeScript
Using Input Mapping
When your data has different field names or requires transformation, use input mapping.- Python
- TypeScript
Configuration
For LLM client configuration options, see Configuring the LLM.Viewing and Modifying the Prompt
You can view the latest versions of our prompt templates on GitHub. The evaluators are designed to work well in a variety of contexts, but we highly recommend modifying the prompt to be more specific to your use case. Feel free to adapt them.- Python
- TypeScript
Using with Phoenix
Evaluating Traces
Run evaluations on traces collected in Phoenix and log results as annotations:Running Experiments
Use the Conciseness evaluator in Phoenix experiments:API Reference
- Python: ConcisenessEvaluator
- TypeScript: createConcisenessEvaluator
Related
- Correctness Evaluator - For evaluating factual accuracy of responses
- Faithfulness Evaluator - For evaluating responses against retrieved context

