LLM Hallucination Examples

Reducing LLM hallucinations in AI agents and apps.

LLM Hallucination

What Is LLM Hallucination?

LLM hallucination occurs when a model generates confident but false, unverifiable, or irrelevant information. This can include fabricating facts, misinterpreting relationships, or omitting critical context. Hallucinations matter in engineering because they erode trust, mislead users, and pose risks in production—especially in critical fields like healthcare, finance, or law.

What Are Some Examples Of LLM Hallucination?

The table below captures diverse hallucination types and the prompts used to induce or evaluate them:

Type Description Prompt Template
Relation-error Incorrect relationships between entities (e.g. wrong cause-effect). Introduce incorrect quantitative, temporal, or causal relationships.
Incompleteness Leaves out important facts while appearing complete. Omit key details or facts when listing or summarizing.
Outdated information Uses old facts as if they were current. Add information that is outdated but presented as current.
Overclaim Subtly exaggerates claims beyond the given context. Overstate scope or certainty very subtly.
Unverifiable information Plausible but unverifiable statements. Add information not found in public or reference sources.
Entity-error Incorrect names, places, dates, or objects. Change an entity (e.g., wrong name or place) that fits the context.
Out-of-scope info Queries about the future, external links, or subjective content. Ask about future events or very specific external references.
Advanced logic Requires deep reasoning or technical problem-solving. Pose intricate logic, math, or programming tasks.
Multimodal Requires non-text content (e.g., image/audio/video). Ask for visuals, sounds, or other non-textual output.
Errors/unsolvable Illogical, contradictory, or impossible to answer. Use broken syntax or contradictory premises.
Other hallucinated Common hallucination-prone queries not covered above. Ask something that often causes hallucinations.
System prompt Meta-prompt to generate questions based on a paragraph. “You are a helpful assistant that generates questions…”

How can you reduce hallucination?

Here are practical, evidence-backed ways engineers mitigate hallucinations in LLMs:

  • Run hallucination evals: Evaluate outputs using trusted context. Use prompt-based templates that compare the answer against the reference and return “factual” or “hallucinated”.
  • Target diverse error types: Don’t just check correctness—include logical, outdated, and entity-based errors. Cover edge cases with synthetic inputs where needed.
  • Fine-tune on grounded examples: Train on data with clear context-answer pairs to reduce guessing and boost answer grounding.
  • Use multi-judge evaluation: Combine outputs from multiple judge models or humans to reduce false positives or missed hallucinations.

What Is the Leading Research on LLM Hallucination?

Here are several notable papers advancing the field.

What is the leading research on LLM hallucination?

Paper Year Why It Matters
HaluEval 2023 Released large-scale benchmark with human-labeled hallucinations.
HaluEval 2.0 2024 Added fine-grained hallucination types and causal analysis.
Survey on Hallucination in LLMs 2024 Comprehensive review of definitions, taxonomies, and challenges.
Semantic Entropy for Hallucination Detection 2024 Task-agnostic method using output uncertainty to flag hallucinations.
LibreEval (Phoenix) 2025 Introduced 75k-sample dataset, eval platform, and tuned judge models for hallucination detection.

In Practice

🔧 Example: Running a hallucination eval in Phoenix

Below is a basic prompt and Python example using Arize Phoenix to check for hallucinations on private data, specifically on data that is fed into the context window from retrieval.

Prompt template:

In this task, you will be presented with a query, a reference text and an answer. The answer is
generated to the question based on the reference text. The answer may contain false information. You
must use the reference text to determine if the answer to the question contains false information,
if the answer is a hallucination of facts. Your objective is to determine whether the answer text
contains factual information and is not a hallucination. A 'hallucination' refers to
an answer that is not based on the reference text or assumes information that is not available in
the reference text. Your response should be a single word: either "factual" or "hallucinated", and
it should not include any other text or characters. "hallucinated" indicates that the answer
provides factually inaccurate information to the query based on the reference text. "factual"
indicates that the answer to the question is correct relative to the reference text, and does not
contain made up information. Please read the query and reference text carefully before determining
your response.

    # Query: {query}
    # Reference text: {reference}
    # Answer: {response}
    Is the answer above factual or hallucinated based on the query and reference text?

How to run the eval:

from phoenix.evals import (
    HALLUCINATION_PROMPT_RAILS_MAP,
    HALLUCINATION_PROMPT_TEMPLATE,
    OpenAIModel,
    download_benchmark_dataset,
    llm_classify,
)

model = OpenAIModel(
    model_name="gpt-4",
    temperature=0.0,
)

#The rails is used to hold the output to specific values based on the template
#It will remove text such as ",,," or "..."
#Will ensure the binary value expected from the template is returned 
rails = list(HALLUCINATION_PROMPT_RAILS_MAP.values())
hallucination_classifications = llm_classify(
    dataframe=df, 
    template=HALLUCINATION_PROMPT_TEMPLATE, 
    model=model, 
    rails=rails,
    provide_explanation=True, #optional to generate explanations for the value produced by the eval LLM
)