Skip to main content

Overview

The Conciseness evaluator assesses whether an LLM’s response uses the minimum number of words necessary to fully answer the question. It detects unnecessary pleasantries, hedging language, meta-commentary, redundant restatements, and unsolicited explanations.

When to Use

Use the Conciseness evaluator when you need to:
  • Detect filler language - Identify unnecessary pleasantries like “Great question!” or “I’d be happy to help”
  • Flag hedging and qualifiers - Catch excessive hedging like “It’s worth noting that…”
  • Identify meta-commentary - Detect self-referential statements about the model’s capabilities
  • Find redundant content - Spot restatements and unnecessary repetition
  • Enforce brevity - Ensure responses are direct and to the point
Conciseness evaluates only whether the response uses more words than necessary. It does not assess correctness, helpfulness, or quality of information. Use the Correctness evaluator for factual accuracy.

Supported Levels

The level of an evaluator determines the scope of the evaluation in OpenTelemetry terms. Some evaluations are applicable to individual spans, some to full traces or sessions, and some are applicable at multiple levels.
LevelSupportedNotes
SpanYesApply to LLM spans where you want to evaluate response brevity.
Relevant span kinds: LLM spans, particularly ones where brevity is important.

Input Requirements

The Conciseness evaluator requires two inputs:
FieldTypeDescription
inputstringThe user’s query or question
outputstringThe LLM’s response to evaluate

Formatting Tips

For best results:
  • Use human-readable strings rather than raw JSON for all inputs
  • For multi-turn conversations, format input as a readable conversation:
    User: What is the capital of France?
    Assistant: Paris is the capital of France.
    User: What is its population?
    

Output Interpretation

The evaluator returns a Score object with the following properties:
PropertyValueDescription
label"concise" or "verbose"Classification result
score1.0 or 0.0Numeric score (1.0 = concise, 0.0 = verbose)
explanationstringLLM-generated reasoning for the classification
direction"maximize"Higher scores are better
metadataobjectAdditional information such as the model name. When tracing is enabled, includes the trace_id for the evaluation.
Interpretation:
  • Concise (1.0): The response contains only the information necessary to answer the question
  • Verbose (0.0): The response contains unnecessary filler, hedging, meta-commentary, or redundant content

Usage Examples

from phoenix.evals import LLM
from phoenix.evals.metrics import ConcisenessEvaluator

# Initialize the LLM client
llm = LLM(provider="openai", model="gpt-4o")

# Create the evaluator
conciseness_eval = ConcisenessEvaluator(llm=llm)

# Inspect the evaluator's requirements
print(conciseness_eval.describe())

# Evaluate a single example
eval_input = {
    "input": "What is the capital of France?",
    "output": "Paris."
}

scores = conciseness_eval.evaluate(eval_input)
print(scores[0])
# Score(name='conciseness', score=1.0, label='concise', ...)

Using Input Mapping

When your data has different field names or requires transformation, use input mapping.
from phoenix.evals import LLM
from phoenix.evals.metrics import ConcisenessEvaluator

llm = LLM(provider="openai", model="gpt-4o")
conciseness_eval = ConcisenessEvaluator(llm=llm)

# Example with different field names
eval_input = {
    "question": "What is the speed of light?",
    "answer": "Approximately 299,792 km/s."
}

# Use input mapping to match expected field names
input_mapping = {
    "input": "question",
    "output": "answer"
}

scores = conciseness_eval.evaluate(eval_input, input_mapping)
For more details on input mapping options, see Input Mapping.

Configuration

For LLM client configuration options, see Configuring the LLM.

Viewing and Modifying the Prompt

You can view the latest versions of our prompt templates on GitHub. The evaluators are designed to work well in a variety of contexts, but we highly recommend modifying the prompt to be more specific to your use case. Feel free to adapt them.
from phoenix.evals.metrics import ConcisenessEvaluator
from phoenix.evals import LLM, ClassificationEvaluator

llm = LLM(provider="openai", model="gpt-4o")
evaluator = ConcisenessEvaluator(llm=llm)

# View the prompt template
print(evaluator.prompt_template)

# Create a custom evaluator based on the built-in template
custom_evaluator = ClassificationEvaluator(
    name="conciseness",
    prompt_template=evaluator.prompt_template,  # Modify as needed
    llm=llm,
    choices={"concise": 1.0, "verbose": 0.0},
    direction="maximize",
)

Using with Phoenix

Evaluating Traces

Run evaluations on traces collected in Phoenix and log results as annotations:

Running Experiments

Use the Conciseness evaluator in Phoenix experiments:

API Reference