Why Language Models Hallucinate

Trey Causey

“[L]anguage models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty”

"Why Language Models Hallucinate", released by Kalai, Nachum, Vempala, and Zhang (mostly of OpenAI) last week.

It has both a potentially surprising conclusion and a definitely (to me) surprising recommendation for solutions — surprising to me because I made a similar point a couple of months ago!

The tl;dr for the conclusion is that language models hallucinate because of the way they are trained and evaluated. No, not because of next-token prediction necessarily, but because of the incentives to provide an answer. Language models are “optimized to be good test-takers” where saying “I don’t know” is not rewarded.

The authors demonstrate that, largely independent of architecture, even in situations where only correct factual information is present in the training data models will still be incentivized to guess incorrectly.

The proposed solution is (naturally) socio-technical by changing how models are scored and disincentivizing guessing over expressing uncertainty.

See: Solving LLM Hallucinations is (mostly) a UX Problem”

Subscribe for new posts

Blog posts only. No commonplace entries. Never sold or shared.

Loading...

Loading...

Why Language Models Hallucinate

Subscribe for new posts