A Comprehensive Taxonomy of Hallucinations in Large Language Models (Universitat de Barcelona, August 2025)

⚡ Quick Summary

This report by Manuel Cossio offers a highly structured, formal, and comprehensive framework for understanding hallucinations in Large Language Models (LLMs). It argues that hallucinations are not merely a bug—but a theoretically inevitable feature of computable LLMs. The taxonomy presented categorizes hallucinations into core dimensions (intrinsic vs extrinsic; factuality vs faithfulness) and maps their manifestations across specific tasks (e.g., summarization, code generation, multimodal outputs). The document also breaks down hallucination causes (data, model, prompt), explores human cognitive biases affecting hallucination perception, surveys benchmarks (e.g., TruthfulQA, HalluLens), and proposes mitigation strategies—both architectural (e.g., RAG, Toolformer) and systemic (e.g., human-in-the-loop evaluation, uncertainty displays). It’s a cornerstone resource for researchers and practitioners aiming to design safer and more trustworthy LLM systems.

🧩 What’s Covered

1. Formal Framework and Inevitability

The report builds on computability theory to prove that hallucination is unavoidable in LLMs. It introduces a formal definition where any computable LLM h diverges from a ground truth function f for some input, no matter the training corpus or architecture. The core theorems (T1–T3) argue hallucinations are universal, frequent, and inescapable across all LLM states.

2. Core Taxonomy: Intrinsic vs Extrinsic; Factuality vs Faithfulness

Using a layered model (see Page 1 diagram), the report categorizes hallucinations:

Intrinsic: internally inconsistent with input (e.g., temporal or logical contradiction).
Extrinsic: fabricated content not in input or reality.
Factuality: contradicts real-world facts.
Faithfulness: deviates from input context or instruction.

3. Specific Manifestations

The report lists and defines 14+ hallucination types:

Factual errors (e.g., incorrect biographical claims).
Instruction deviation (ignoring directives).
Temporal disorientation (outdated claims).
Amalgamated errors, nonsensical outputs, multimodal inconsistencies.Each is illustrated with examples (e.g., “The Parisian Tiger was hunted to extinction” as extrinsic fabrication).

4. Root Causes

Hallucinations are attributed to:

Data issues: outdated, biased, or noisy sources.
Model design: autoregressive generation, overconfidence, lack of reasoning.
Prompting: adversarial input, ambiguity, confirmatory bias.

5. Human Cognitive Factors

The report identifies how biases (e.g., automation bias, fluency heuristics) and user overtrust amplify hallucination risks. These are compounded by LLM overconfidence and stylistic polish, which obscure factual inaccuracy.

6. Mitigation Strategies

Architectural: Toolformer, Retrieval-Augmented Generation (RAG), adversarial fine-tuning.
Systemic: UI-level mitigations (uncertainty displays, source-grounding), symbolic guardrails, fallback policies.

7. Benchmarks and Metrics

Surveys key datasets:

TruthfulQA, FActScore, HalluLens (taxonomy-aware).
Domain-specific (e.g., MedHallu, CodeHaluEval).And metrics:
SummaC, FactCC, QuestEval, RAE, KILT.Notably, human evaluation remains gold standard.

8. Monitoring Tools

The report reviews real-time hallucination tracking via:

Vectara Hallucination Leaderboard,
Epoch AI Dashboard (links hallucination reduction with training compute),
LM Arena (user-driven head-to-head evaluations, real-world trust signals).

💡 Why it matters?

This report reframes hallucination not as an error to fix—but as a design constraint inherent to current LLM paradigms. It equips AI developers, safety researchers, and regulators with the conceptual rigor and empirical tools needed to move beyond cosmetic solutions. By anchoring hallucination within computability theory, it sets clear boundaries for mitigation, grounding the conversation in what’s feasible, not just desirable. Its layered taxonomy and mapping of hallucination types to causes and metrics enable more effective task-specific and domain-specific responses. In a landscape where LLMs are deployed in high-stakes domains (law, medicine, finance), this report helps operationalize trustworthy AI through layered safeguards and continuous monitoring.

❓ What’s Missing

Unified standard: While it discusses multiple benchmarks and metrics, the field still lacks a single evaluation framework harmonized across tasks and hallucination types.
Mitigation efficacy: The report outlines mitigation strategies but stops short of deeply analyzing their relative performance or long-term viability.
Case studies: Real-world deployment examples or user impact studies would bolster its applied relevance.
Causal reasoning mechanisms: It notes the lack of LLM causal reasoning but doesn’t explore architectural proposals to fill that gap.

👥 Best For

AI safety researchers designing hallucination detection and mitigation protocols.
LLM developers and architects exploring ways to reduce risk and increase reliability.
Product teams deploying LLMs in regulated or high-stakes environments.
Policymakers drafting LLM oversight frameworks.
Academics and PhD students in NLP, HCI, and cognitive science studying trust and interpretability.

📄 Source Details

Title: A Comprehensive Taxonomy of Hallucinations in Large Language Models
Author: Manuel Cossio, MMed, MEng
Institution: Universitat de Barcelona
Date: August 2025
arXiv ID: [2508.01781v1 [cs.CL]]
Length: 56 pages
Primary References: Xu et al. (2024) on inevitability, HalluLens (2025), FActScore (2023), RAG architectures, Vectara leaderboard, Toolformer (2023), and Med-PaLM 2 design patterns.