“[L]anguage models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty”
Current Japanese Micro-Season
Loading...
Loading...
“[L]anguage models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty”
It has both a potentially surprising conclusion and a definitely (to me) surprising recommendation for solutions — surprising to me because I made a similar point a couple of months ago!
The tl;dr for the conclusion is that language models hallucinate because of the way they are trained and evaluated. No, not because of next-token prediction necessarily, but because of the incentives to provide an answer. Language models are “optimized to be good test-takers” where saying “I don’t know” is not rewarded.
The authors demonstrate that, largely independent of architecture, even in situations where only correct factual information is present in the training data models will still be incentivized to guess incorrectly.
The proposed solution is (naturally) socio-technical by changing how models are scored and disincentivizing guessing over expressing uncertainty.
See: Solving LLM Hallucinations is (mostly) a UX Problem”