Apple puts "do not hallucinate" into prompts, it works
twitter.comI'd like a rational explanation of how the LLM interprets "don't hallucinate" -Is it perhaps "translated" internally to the functional equivalent of a higher confidence check on output?
Otherwise, I think it's baloney. I know there is not a simple linear mapping from plain english to the ML, but the typed word clearly is capable of being parsed and processed, its the "somehow" I'd like to understand better. What would this do the interpretation of paths through weights.
Pretty much 'citation needed'
Everything about prompt engineering is just the voodoo chicken.
what's interesting about the anecdotes on that link though, is that once the confusion settles, there is an explanation. the chicken may have made no sense, but the problem did get solved, and the chicken was necessary, but not for the reasons you thought.
maybe prompt engineering will make sense someday, maybe we need artificial general psychology first?
Interestingly, negative prompts for stable diffusion (like "deformed hands") has similar effect. How LLM decides what's hallucinations? Mayhaps, it double checks itself? But probably it became self-aware.
X doubt