LLM Hallucination Examples

LLM Hallucination

What Is LLM Hallucination?

LLM hallucination occurs when a model generates confident but false, unverifiable, or irrelevant information. This can include fabricating facts, misinterpreting relationships, or omitting critical context. Hallucinations matter in engineering because they erode trust, mislead users, and pose risks in production—especially in critical fields like healthcare, finance, or law.

What Are Some Examples Of LLM Hallucination?

The table below captures diverse hallucination types and the prompts used to induce or evaluate them:

Type	Description	Prompt Template
Relation-error	Incorrect relationships between entities (e.g. wrong cause-effect).	Introduce incorrect quantitative, temporal, or causal relationships.
Incompleteness	Leaves out important facts while appearing complete.	Omit key details or facts when listing or summarizing.
Outdated information	Uses old facts as if they were current.	Add information that is outdated but presented as current.
Overclaim	Subtly exaggerates claims beyond the given context.	Overstate scope or certainty very subtly.
Unverifiable information	Plausible but unverifiable statements.	Add information not found in public or reference sources.
Entity-error	Incorrect names, places, dates, or objects.	Change an entity (e.g., wrong name or place) that fits the context.
Out-of-scope info	Queries about the future, external links, or subjective content.	Ask about future events or very specific external references.
Advanced logic	Requires deep reasoning or technical problem-solving.	Pose intricate logic, math, or programming tasks.
Multimodal	Requires non-text content (e.g., image/audio/video).	Ask for visuals, sounds, or other non-textual output.
Errors/unsolvable	Illogical, contradictory, or impossible to answer.	Use broken syntax or contradictory premises.
Other hallucinated	Common hallucination-prone queries not covered above.	Ask something that often causes hallucinations.
System prompt	Meta-prompt to generate questions based on a paragraph.	“You are a helpful assistant that generates questions…”

How can you reduce hallucination?

Here are practical, evidence-backed ways engineers mitigate hallucinations in LLMs:

Run hallucination evals: Evaluate outputs using trusted context. Use prompt-based templates that compare the answer against the reference and return “factual” or “hallucinated”.
Target diverse error types: Don’t just check correctness—include logical, outdated, and entity-based errors. Cover edge cases with synthetic inputs where needed.
Fine-tune on grounded examples: Train on data with clear context-answer pairs to reduce guessing and boost answer grounding.
Use multi-judge evaluation: Combine outputs from multiple judge models or humans to reduce false positives or missed hallucinations.

What Is the Leading Research on LLM Hallucination?

Here are several notable papers advancing the field.

What is the leading research on LLM hallucination?

Paper	Year	Why It Matters
HaluEval	2023	Released large-scale benchmark with human-labeled hallucinations.
HaluEval 2.0	2024	Added fine-grained hallucination types and causal analysis.
Survey on Hallucination in LLMs	2024	Comprehensive review of definitions, taxonomies, and challenges.
Semantic Entropy for Hallucination Detection	2024	Task-agnostic method using output uncertainty to flag hallucinations.
LibreEval (Phoenix)	2025	Introduced 75k-sample dataset, eval platform, and tuned judge models for hallucination detection.

In Practice

🔧 Example: Running a hallucination eval in Phoenix

Below is a basic prompt and Python example using Arize Phoenix to check for hallucinations on private data, specifically on data that is fed into the context window from retrieval.

Prompt template:

Copy

In this task, you will be presented with a query, a reference text and an answer. The answer is generated to the question based on the reference text. The answer may contain false information. You must use the reference text to determine if the answer to the question contains false information, if the answer is a hallucination of facts. Your objective is to determine whether the answer text contains factual information and is not a hallucination. A 'hallucination' refers to an answer that is not based on the reference text or assumes information that is not available in the reference text. Your response should be a single word: either "factual" or "hallucinated", and it should not include any other text or characters. "hallucinated" indicates that the answer provides factually inaccurate information to the query based on the reference text. "factual" indicates that the answer to the question is correct relative to the reference text, and does not contain made up information. Please read the query and reference text carefully before determining your response. # Query: {query} # Reference text: {reference} # Answer: {response} Is the answer above factual or hallucinated based on the query and reference text?

How to run the eval:

Copy


from phoenix.evals import (
    HALLUCINATION_PROMPT_RAILS_MAP,
    HALLUCINATION_PROMPT_TEMPLATE,
    OpenAIModel,
    download_benchmark_dataset,
    llm_classify,
)

model = OpenAIModel(
    model_name="gpt-4",
    temperature=0.0,
)

#The rails is used to hold the output to specific values based on the template
#It will remove text such as ",,," or "..."
#Will ensure the binary value expected from the template is returned 
rails = list(HALLUCINATION_PROMPT_RAILS_MAP.values())
hallucination_classifications = llm_classify(
    dataframe=df, 
    template=HALLUCINATION_PROMPT_TEMPLATE, 
    model=model, 
    rails=rails,
    provide_explanation=True, #optional to generate explanations for the value produced by the eval LLM
)

Arize AX

Learn

Insights

Company

Arize AX

Learn

Insights

Company

LLM Hallucination Examples