On February 24, 2025, a federal district court sanctioned three lawyers from the law firm Morgan & Morgan for citing fake cases. As it turned out, these cases were generated by an AI tool. Mistakes made by large language models, such as GPT, have real-world consequences.
We often call such mistakes hallucinations. A hallucination occurs when a language model hidden behind most modern AI tools generates a factually incorrect, logically incoherent, or otherwise defective answer. When language models hallucinate, they often sound confident. This makes hallucinations dangerous.
To deepen my knowledge, I read over 20 academic papers on hallucinations. Now, I want to share what I learned. In this article, I’ll focus on defining and categorizing hallucinations, their underlying causes, and answering a crucial question: Are hallucinations inevitable? In a follow-up, I’ll talk about hallucination detection and mitigation.
What are hallucinations
A hallucination is an output produced by a language model that sounds plausible and confident, but is wrong in some way. The most intuitive form of hallucination is generating a factually incorrect statement like “Thomas Edison built the Eiffel Tower.”
But there are also more subtle hallucinations. Imagine that you ask an AI tool to summarize a document about Napoleon Bonaparte. The document contains his year of birth, but not the exact date. Yet, the AI tool uses its general knowledge and adds the exact date to the generated summary. The information is relevant, but it isn’t present in the document you want to summarize. Many would call this a hallucination.
Categories of hallucination
Since hallucinations can take many different forms, researchers found it worthwhile to group hallucinations into separate categories. This work is important because detecting or mitigating different types of hallucination might require different approaches.
The categorization I find most useful was presented in the article A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. At the highest level, we can distinguish between:
- Factuality hallucination, which includes factually incorrect statements, made-up statements like the fake cases from the introduction, but also more subtle issues like: “the most important cause of declining birth rates is unaffordable housing.” It is possible that expensive housing contributes to declining birth rates, but it’s likely just one of many contributing factors.
- Faithfulness hallucination, which occurs when the output generated by the language model is inconsistent with what came before. This includes:
- Instruction inconsistency: The language model doesn’t do what you ask it to do. For example, you ask it to translate a question into a different language, but instead, the model answers it.
- Context inconsistency: The answer doesn’t follow from the provided context, like an uploaded source document. The Napoleon Bonaparte example from above is an example of this type of hallucination.
- Logical inconsistency: A typical example is an incorrect result of a mathematical operation.
Many researchers distinguish between intrinsic and extrinsic hallucinations. Intrinsic hallucinations roughly correspond to faithful hallucinations, but extrinsic hallucinations often have conflicting definitions. Some define extrinsic hallucinations as statements that contradict our knowledge about the world, others as statements that contradict the knowledge contained in the data used to train the model.
We can see the difference in the famous blunder made by Google’s language model BART. Somebody asked what to do when cheese doesn’t stick to pizza. The bot recommended adding glue. If we consider our general knowledge about the world, we wouldn’t give such advice. But as it turned out, this recommendation is part of the model’s training data.
Is “hallucination” even the right word?
Some researchers argue that the term “hallucination” is misleading. We say that humans hallucinate if they see or hear something that isn’t there. This isn’t what language models do when they hallucinate. So maybe we should use other phenomena from psychology to describe the mistakes language models make. For example:
- Source amnesia: We have a certain belief or information, but we don’t remember where we got it from. A language model can cite incorrect sources or mix up credible sources like scientific papers with non-credible ones like fictional stories.
- Availability bias: Humans are more likely to use information that’s easy to recall. Language models can give more importance to information that’s more frequent in the dataset, even if incorrect.
- Suggestibility: Our beliefs are often influenced by external factors like advertising. If a prompt given to a language model is strongly biased or contains leading questions, the model might follow up on these suggestions in its response.
- Cognitive dissonance: This phenomenon occurs when we hold two conflicting beliefs at the same time. The data used to train language models is full of such conflicting information.
- Confabulation: This term describes false memories — memories of things that never happened. It’s not that the person is consciously making stuff up. They believe what they are saying. Similarly, a language model often sounds confident while being wrong.
What causes hallucinations
Now that we have defined hallucinations and explored different categories, let’s dive into the underlying causes. The article A Survey on Hallucination in Large Language Models from earlier conveniently splits possible causes according to the phases of the language model life cycle: training data, training (which we can split into pre-training and post-training), and use.
Data
Bad data fed into a language model during training is just the latest incarnation of the famous GIGO acronym: garbage in, garbage out. If the data we use to train a language model contains factual inaccuracies, the model will learn them. Nowadays, big language models like GPT or Gemini are trained on virtually every publicly available piece of information. And even though the engineers are trying to clean the data (often using an older version of some language model), it’s impossible to get rid of every issue.
Besides containing inaccuracies, the data used to train the model are often incomplete. First, each model has a knowledge cut-off at a certain date, so it cannot provide accurate information about recent events. Second, we often use AI tools to perform novel and unique tasks, which might not be covered by the training data. AI chatbots can get around the first issue, for example, by using web search. The second issue is trickier. In the end, we train machine learning models exactly to help us solve new tasks. I’ll return to this issue later, when I discuss the inevitability of hallucinations.
Pre-training
Pre-training is the first phase of training a language model. We train the model to predict the next word in the provided text. After pre-training, our model understands how our human languages work and has some general knowledge about the world. However, it’s usually still terrible at following instructions.
In their paper Why Language Models Hallucinate, researchers from OpenAI and Georgia Tech show that even a language model trained on a perfect dataset would still hallucinate. Sparing you of the underlying math, they identify several possible sources of hallucination:
- Computational hardness: We have known for a long time that certain problems cannot be solved in finite time, no matter what we do — we have proven it mathematically. These problems might be rare in practice, but they are our first hint for answering the question if hallucinations are inevitable.
- Arbitrary fact hallucination: A modern language model is just a special type of machine learning model. When we train a machine learning model, the model attempts to learn some pattern from the data we provide, so that it can later make predictions about previously unseen inputs. But certain types of data, like dates of birth, have no meaningful pattern.
- Wrong type of model: Even if our data exhibits some pattern, the type and the architecture of the model we are training might not be able to capture it. A simplified example is using a linear model that can only draw straight lines to describe data forming a circle.
Post-training
In the second phase of training, often called post-training, we teach the model to follow our instructions, but also to generate chains of thought, or to use tools like web search. We can do this, for example, by showing the model lots of carefully crafted input-output examples labeled and evaluated by humans.
The authors of the paper introduced in the pre-training section say that, in principle, post-training can actually reduce the frequency of hallucination. But in practice, we often see the opposite. The core of the issue is that we punish the language model if it expresses uncertainty, for example, by answering “I don’t know.” We consider this answer just as wrong as outputting something actually incorrect.
During post-training, our model is like a high-school student taking an exam where they are awarded 1 point for a correct answer and 0 points for both a wrong answer and no answer. What will such a student do if they don’t know the answer? They guess. Random choice gives them at least some chance of getting a point. Language models might be doing the same: if they don’t know, they produce something, to have at least some chance of success.
Use
We can also cause hallucinations by how we use a language model. We already covered some examples. First, if you ask a language model to perform a task that’s not well covered by its training data, it might hallucinate (this cause combines both the data and use aspect of a language model). Second, I mentioned the term “suggestibility”: certain prompts, like those containing leading questions, can lead to incorrect outputs.
A language model predicts how likely each token (or word) in its vocabulary is to appear after the text at the input. You would guess that during generation, we simply select the most probable word. But this is not the case. Many commercial AI tools, like ChatGPT, use random sampling — they select the next word randomly while respecting the word probabilities calculated by the language model. The authors of the paper LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations analyzed what happens when you use random sampling to generate multiple responses to the same prompt. One situation that can occur is that the language model is right most of the time, but it sometimes hallucinates. This suggests that the model knows the correct answer, but random sampling led to an incorrect one.
Language models can also hallucinate when the prompt you provide to the model is too long. They are biased towards focusing on the more recent parts of the previous text and might ignore what was said at the beginning. For example, during summarization, the model might ignore source documents in favor of staying consistent with the partial summary generated so far.
Are hallucinations inevitable?
Philosophical view
Let’s start answering this question by looking at a more philosophical paper by Richard Ackermann and Simeon Emanuilov.
According to Heidegger, the meaning or purpose of anything in the world stems from its relationships with other things. To understand the world as a whole, we need to understand the networks of such relationships. Language is an imperfect reflection of this network — it captures a lot, but lacks grounding in actual lived experience.
The authors believe that current language models simply cannot represent many important relationships. We are limited to relationships between words, but cannot model relationships of words with real-world phenomena and lived experience. This means that language models generate coherent text outputs that are not always factually grounded.
Practical view
Two papers with a title starting with “Hallucination is inevitable” (spoiler alert) offer a more practical take. The first paper starts its argument by distinguishing between a closed world and an open world. A closed world is static. It doesn’t change over time. In a closed world, we always solve the same problems. An open world is the opposite: it changes all the time, and we get a constant stream of previously unseen problems. Any learner, be it a human or an AI model, sometimes struggles in such an environment.
We generally want an AI system to solve problems in an open world. This is the holy grail of generalization. However, our experience is that AI systems often struggle once the task becomes different from what the AI system has seen during training.
Language models we use these days should be better at generalizing to new tasks than simple machine models that came before. Yet, they still suffer from a crucial flaw: they don’t learn over time. Sure, ChatGPT can “memorize” some information from past conversations, but that’s not proper learning. When invoking memories, ChatGPT just extends the prompt with a few short pieces of text it stores in a database. True learning requires the model to change its underlying parameters (weights). That happens only when a new model version is released, and what the model has learned is not based on how you used it.
Mathematical proof
The second paper on the inevitability of hallucinations proves mathematically that for each language model, there exists a task that the model cannot solve perfectly; it will hallucinate for certain inputs. This doesn’t mean that the model will hallucinate for all tasks. However, some tasks will make a language model sweat. The authors list some candidates, but they define the tasks mathematically, and I found it difficult to compare their definitions with something practical.
So there you have it. I think we have pretty strong evidence for the inevitability of hallucinations. There are tasks where a language model will hallucinate at least some percentage of the time, no matter what we do.
Does this mean we should stop using them? Of course not. But we need to be careful. It’s worth asking questions like: How (un)common is the task? How costly is a mistake? Can we tolerate some level of errors?
As users, we shouldn’t blindly trust the output generated by AI tools like ChatGPT. Recently I asked a chatbot: “What is ‘softmax bottleneck’ in the context of language models?” It gave me a confident, plausible output. But after reading some papers on the topic, I discovered that the chatbot was completely wrong.
As researchers and engineers, we have some tricks up our sleeves. We can try to detect hallucinations as they are happening and then react, for example, by delegating the given prompt to a human. We can also use some techniques to decrease the hallucination rate. These two avenues will be the topic of a follow-up article. Stay tuned.