Three Papers to Name the Monster. Zero to Name Ours.
NEJM AI published a perspective, a letter, and a response about what to call AI errors. Nobody mentioned that doctors kill 250,000 people a year with errors we still call “adverse events.”
Amos Grünebaum, MD | ObGyn Intelligence
A neurologist, a neuropsychologist, and a telestroke physician walk into NEJM AI. Their debate: when a large language model generates a wrong answer, should we call it a “hallucination” or a “confabulation”? Or should we abandon human metaphors entirely and invent a mechanism-based term?
Three papers. Seven pages. Multiple references to Goya, Hamlet, and the philosophy of consciousness.
Not a single mention of the fact that physician errors are the third leading cause of death in the United States.
The vocabulary we choose for mistakes tells us who we are willing to hold accountable.
The Debate
The exchange starts with a perspective by Wiest and Turnbull, a neurologist and neuropsychologist, published in NEJM AI in October 2025. Their argument is straightforward. A hallucination is a sensory perception without a real stimulus. It requires conscious experience. AI has no consciousness. Therefore, calling AI errors “hallucinations” borrows a medical term and applies it where it does not belong.
They propose “confabulation” instead. In neurology, confabulation means the active generation of false information without the intention to deceive. Patients with Korsakoff syndrome or certain dementias confabulate. They produce confident, fluent, completely wrong answers. Sound familiar? Wiest and Turnbull think so.
Then Ro, a telestroke physician at Sutter Health, responds. He agrees that “hallucination” is wrong. But he argues that “confabulation” is still too human. Confabulation in people is shaped by emotion and defense of the self. AI has neither. Ro wants mechanism-based terminology that describes what the technology actually does: probabilistic next-token prediction optimized for fluency, not accuracy. He does not, however, propose an actual term.
Wiest and Turnbull fire back. They offer three criteria for the ideal term: it should avoid implying consciousness, avoid implying emotion-based error, and be simple enough for wide adoption. They note that we anthropomorphize things all the time. Ships have names. We still know they are ships. They close with Hamlet: whether you see a camel, a weasel, or a whale in the cloud, you know it is a cloud.
What Nobody Said
Here is what is missing from all three papers: any awareness of the irony.
Medical errors contribute to an estimated 250,000 deaths per year in the United States. Some analyses put the number higher. By some measures, medical error is the third leading cause of death, behind heart disease and cancer. Doctors misremember findings. They anchor on the wrong diagnosis. They produce confident, fluent, completely wrong assessments. They do this every day, in every hospital, in every specialty.
Nobody has proposed calling those errors “hallucinations.” Nobody has published three papers in NEJM AI debating whether to call them “confabulations.”
Instead, the profession reaches for language designed to distribute responsibility as widely as possible. Physician mistakes become “adverse events.” Wrong diagnoses become “diagnostic errors” or “cognitive failures.” Deaths caused by doctors become “preventable mortality.” The vocabulary shifts from pathology to engineering, from individual to system, from blame to process.
This is not an accident. It is a framing choice.
The Double Standard
Daniel Kahneman spent a career showing how the framing of a question changes the answer. The framing of a mistake changes who gets blamed.
When AI produces a wrong answer, we borrow terms from psychiatry and neurology. We call it a hallucination, a confabulation, a disorder of an artificial mind. The language implies something broken, something pathological. It invites suspicion. It tells the user: do not trust this thing.
When a physician produces a wrong answer, we borrow terms from quality improvement and systems engineering.
We call it a process failure, a communication breakdown, a systems issue. The language distributes blame across an institution.
It tells the patient: this was unfortunate, but no one person is responsible.
Both framings serve someone’s interest. Neither serves the patient.
I am not saying Wiest, Turnbull, and Ro are wrong to care about precision in language.
They are right.
Words shape how clinicians think about technology, and the wrong word can lead a doctor to overtrust or undertrust an AI tool at the bedside. Getting the terminology right matters.
But the same profession that demands precise terminology for AI errors has spent decades softening the language around its own.
If we are going to hold AI to the standard of naming its failures accurately, we should hold ourselves to the same standard.
What Patients Should Know
AI makes mistakes. Doctors make mistakes. The difference is not the frequency or the severity.
The difference is the vocabulary.
When someone tells you that AI “hallucinates,” they are telling you to be skeptical. Good.
Be skeptical.
Ask for sources.
Verify the output.
Apply the same skepticism to your doctor. Ask what the evidence shows. Ask for the numbers. Ask whether the recommendation is based on a guideline, a study, or a habit. Doctors do not hallucinate or confabulate. But they do make errors, and the errors are no less dangerous because we have given them gentler names.
Bottom Line
Three papers in NEJM AI to name the monster that lives inside a large language model.
Zero papers, in any journal, proposing that we rename the monster that lives inside a medical system that kills a quarter of a million people a year.
The vocabulary we choose for mistakes tells us who we protect.
It is time to protect the patient. Period.
References
1. Wiest G, Turnbull OH. Faulty artificial intelligence, or the sleep of reason. NEJM AI 2025;2(11). DOI: 10.1056/AIp2500785.
2. Ro DI. From psychological metaphors to mechanistic framing in describing errors in large language models. NEJM AI 2026;3(3). DOI: 10.1056/AIp2501328.
3. Wiest G, Turnbull OH. Response: Metaphors and errors in describing large language models. NEJM AI 2026;3(3). DOI: 10.1056/AIp2501416.
4. Makary MA, Daniel M. Medical error: the third leading cause of death in the US. BMJ 2016;353:i2139. DOI: 10.1136/bmj.i2139.


