On Early Detection of Hallucinations in Factual Question Answering

2 points by kig 2 years ago · 2 comments

Reader

kigOP 2 years ago

The researchers found that certain artifacts associated with LLM model generations could potentially indicate whether or not a model is hallucinating. Their results showed that the distributions of these artifacts were different between hallucinated and non-hallucinated generations. Using these artifacts, they trained binary classifiers to classify model generations into hallucinations and non-hallucinations. They also discovered that tokens preceding a hallucination can predict the subsequent hallucination before it occurs.

jruohonen 2 years ago

I didn't read the paper, but it seems they're trying to fix a ML model by a ML model. I am not sure whether that's a good idea, but I digress. Besides, how do they know what is a hallucination and what is a non-hallucination (cf. a similar debate on disinformation)?

Settings

On Early Detection of Hallucinations in Factual Question Answering

Keyboard Shortcuts