How does one detect hallucinations?

5 points by sourabh03agr 2 years ago · 2 comments · 2 min read

Hallucinations are an interesting artifact of LLMs where the model tends to make up facts or generate outputs that are not factually correct.

There are two broad approaches for detecting hallucinations:

1. Verify the correctness of the response against world knowledge (via Google/Bing search)

2. Verify the groundedness of the response against the information present in the retrieved context

The 2nd approach is more interesting and useful as the majority of LLM applications have an RAG component, and we ideally want the LLM only to utilize the retrieved knowledge to generate the response.

While researching state-of-the-art techniques on how to verify that the response is grounded wrt context, two of the papers stood out to us:

1. FactScore (https://arxiv.org/pdf/2305.14251.pdf): Developed by researchers at UW, UMass Amherst, Allen AI and Meta, it first breaks down the response into a series of independent facts and independently verifies if each of them.

2. Automatic Evaluation of Attribution by LLMs (https://arxiv.org/pdf/2305.06311.pdf): Developed by researchers at Ohio State University, it prompts the LLM judge to determine whether the response is attributable (can be verified), extrapolatory (unclear) or contradictory (can’t be verified).

While both the papers are awesome reads, you can observe that they tackle complementary problems and, hence, can be combined for superior performance:

1. The responses in production systems typically consist of multiple assertions; hence, breaking them into facts, evaluating them individually, and taking average is a more practical approach.

2. Many responses in production systems fall in the grey area, i.e. the context may not explicitly support (or disprove) them but one can make a reasonable argument to infer them from the context. Hence, having three options - Yes, No, Unclear is a more practical approach

This is exactly what we do at UpTrain to evaluate factual accuracy. Learn more about it: https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy

navjack27 2 years ago

A fun uphill battle while LLMs aren't trained on entirely factual data from the beginning, and I mean the very start. They are not fact regurgitating programs. People will make a lot of money saying they have this solved but all they have are bandaids. I personally don't want a factual LLM I want one that helps me with sparking my own creativity. They do that right now. Hallucinations are a feature not a bug.

sourabh03agrOP 2 years ago

That's fair, but a lot of use cases require strict information retrieval and don't want the LLM to get creative. I am of the opinion that having an LLM which is always factually correct, is an almost impossible task and we would always need monitoring to catch and fix cases

Settings

How does one detect hallucinations?

Keyboard Shortcuts