Concept · 2026-05-16
LLM hallucination — what it is, why it happens, what reduces it
Hallucination is the LLM-era word for confidently asserting facts that aren't real. It has measurable rates, predictable causes, and a layered set of mitigations. Here's the full picture.
Definition
Hallucination in the LLM context is a factual error generated by a large language model, typically presented with the same fluency and confidence as a correct statement. The model produces text that looks right — proper grammar, plausible structure, real-sounding citations — but contains assertions that don't correspond to verifiable reality.
The word is borrowed from psychiatry but the mechanism is different. LLMs aren't having sensory experiences. They're producing plausible next tokens. When the most plausible continuation happens to be false, you get a hallucination.
Five categories of hallucination
- Fabricated facts. The model invents a fact that doesn't exist. "The library libfoo 3.4 introduced the bar() function" — when libfoo never had a bar() function.
- Misattributed quotes. Real quote, wrong person. "As Einstein said: be the change you want to see in the world" — which Einstein never said.
- Fabricated citations. The model produces a URL, paper, or book that doesn't exist. Especially common with academic citation tasks.
- Stitched-together claims. Real fragments recombined incorrectly. "GPT-4 was released by Anthropic in 2023" — GPT-4 is real, 2023 is real, but Anthropic isn't the publisher.
- Temporal hallucination. Facts true at training time, presented as currently true. "The latest version of Python is 3.11" — true when the model was trained, no longer true now.
Measured rates (2026 data)
Hallucination rate depends heavily on domain, model, and prompt. Published research on frontier models in 2026 generally shows:
- ~1-5% on well-trodden questions. Capitals of countries, basic biology, common math. The training set covered these heavily; the model has redundant evidence.
- ~5-15% on moderately specialized queries. History of a tech company, mid-list scientist's publications, named software features. The training set has some coverage; the model interpolates.
- ~15-40% on long-tail technical questions. Specific library version features, niche academic results, recent events past the training cutoff. The training set is thin or absent; the model fabricates plausibly.
- ~30-60% on citation tasks. "List 5 papers that introduced technique X." The model is statistically rewarded for producing 5 entries even when fewer real ones exist, leading to invented citations.
These rates have dropped roughly 2x year-over-year since 2022 but haven't reached zero. They likely won't — the training objective (next-token plausibility) and the factuality objective (assertion correctness) aren't the same thing.
Why hallucination happens
Six root causes, ordered by how addressable each is:
- Statistical sampling. The model generates the next token by sampling from a probability distribution. Sometimes the most-likely token is false. This is mathematically unavoidable in pure generation.
- Compression artifacts. A model with 70B parameters can't store every fact verbatim. Facts are compressed; compression is lossy. Edges blur.
- Training-set noise. The model trained on text that included wrong facts. It learned the wrong fact with the same fluency as the right one.
- Temporal cutoff. The model has no knowledge of events past its training cutoff but doesn't know it doesn't know. It fabricates.
- RLHF reward hacking. Models are fine-tuned with reinforcement learning to produce helpful responses. "I don't know" is often rated lower than a confident guess. The model learns to guess.
- Prompt ambiguity. The user asks a question with multiple correct interpretations; the model picks one; it doesn't match the user's intended scope.
What reduces hallucination (ordered by impact)
- Signed-claim verification (see LLM grounding). Post-process every assertion against a verified-claim catalog. Unverified assertions get flagged or stripped. Reduces hallucination from ~15-40% to <1% on covered domains. Cost: per-call latency + catalog curation.
- Retrieval-augmented generation. Insert retrieved context into the prompt. Cuts hallucination roughly in half on covered domains. Cost: vector DB + retrieval latency.
- Confidence calibration. Train the model to say "I don't know" when uncertain. Helps but doesn't eliminate — the model has to know what it doesn't know, which is hard.
- Tool-call verification. Give the model a verify_claim() tool. Let it self-invoke when uncertain. Hybrid of (1) and human-in-the-loop.
- Prompt engineering for honesty. "Answer only with information you can cite a specific source for." Modest impact; instruction-following on factuality is unreliable.
- Self-consistency sampling. Generate N answers, take the majority vote. Reduces variance but not bias — wrong-but-consistent answers stay wrong.
- Temperature = 0. Removes sampling randomness. Helps marginally; doesn't fix the underlying cause.
Detecting hallucination at runtime
Three techniques for catching hallucination after generation but before display:
- Cite-or-strip. Require every assertion to have a citation. Strip uncited lines. This is the strongest pattern but requires curated claim catalog.
- Multi-model cross-check. Ask a second model "is this true?" about the first model's output. Crude but cheap. Catches obvious fabrications.
- External verification API. POST the claim to a service like
/api/v1/verifyand check the response. Five lines of Python gets you a working version.
Where hallucination cost is highest
- Medical / legal / financial. User acts on wrong information; real-world harm. Mitigation: signed-claim verification + UX badges marking unverified assertions.
- Code generation. Model invents a function that doesn't exist; user's code crashes. Mitigation: type-aware retrieval over the actual library AST.
- Academic / research. Fabricated citations propagate. Mitigation: every cited source has a clickable URL that resolves.
- Customer support. Wrong product information costs trust. Mitigation: ground on the company's own documentation; don't answer out of scope.
- Public-facing chat. Wrong answer ends up in screenshots, social media. Mitigation: confidence thresholds; defer to humans when uncertain.
Will it ever be solved?
Probably not in the "model never hallucinates" sense. The training objective rewards plausibility, not truth, and plausible-but-false will always have non-zero probability.
What will improve over time is the infrastructure around the model: better retrieval, better signed-claim catalogs, better verification APIs, better UX patterns for surfacing confidence. The model stays imperfect; the surrounding system makes the imperfection safe.
The right mental model is not "LLMs are calculators that sometimes give wrong answers." It's "LLMs are fluent first-drafters that need a fact-checking pipeline downstream." The pipeline is the product.
Further reading
- LLM grounding — the broader concept hallucination control falls under
- Verifying AI-generated facts in 5 lines of Python — hands-on hallucination filter
- Why VERITAS doesn't ship performance-comparison claims
- Quickstart — first API call in 5 minutes
- Browse the catalog — 136 hand-verified AI/ML claims