“[L]anguage models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty”
Subscribe for new posts
Blog posts only. No commonplace entries. Never sold or shared.
Current Japanese Micro-Season
Loading...
Loading...
“[L]anguage models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty”
Blog posts only. No commonplace entries. Never sold or shared.
It has both a potentially surprising conclusion and a definitely (to me) surprising recommendation for solutions — surprising to me because I made a similar point a couple of months ago!
The tl;dr for the conclusion is that language models hallucinate because of the way they are trained and evaluated. No, not because of next-token prediction necessarily, but because of the incentives to provide an answer. Language models are “optimized to be good test-takers” where saying “I don’t know” is not rewarded.
The authors demonstrate that, largely independent of architecture, even in situations where only correct factual information is present in the training data models will still be incentivized to guess incorrectly.
The proposed solution is (naturally) socio-technical by changing how models are scored and disincentivizing guessing over expressing uncertainty.
See: Solving LLM Hallucinations is (mostly) a UX Problem”