The generation vs verification delta explains why LLM's are useful

3 min read Original article ↗

Simian Words

Ever heard that you still need to verify what an LLM says so it implies that LLMs are as good as useless? I always felt that it was a lazy argument. I gave this argument some thought and I came out with an explanation that goes beyond just LLMs.


I was recently looking for a word in English - I knew I had this in the tip of my tongue but was not able to find it. I asked ChatGPT to help me.

This was my question:

image

The word I was looking for was "confers". Now I don't have to explain why I don't need to verify what the LLM provided. That would be a stupid exercise. It should be clear to anyone that the LLM has genuinely helped me and there is close to zero chance of it being incorrect. I just know I was looking for "confers".

I think this same process extends to anything the LLM helps me with - it generates and I verify. The complexity or effort needed in my verification is much lower than the complexity or effort needed for the LLM to generate it.

Another example to drive the point across: if I want to create a nice looking Logo for my brand, I can ask someone to help me do it and just show me the end product. I don't need to know their process, I don't need to know the options they tried and discarded. I just need to know the final product to make the judgement.

Just like when I find the single missing puzzle piece, I know for sure that this is the one. I don't need to then take all the other pieces and verify by elimination.

My main point:

So long as LLM's are slightly accurate in generating and we personally have some intuition on whether the answers are correct, LLMs will be directionally useful.

While AI can make mistakes, if it is sufficiently directionally accurate, it is useful and will make you productive. Question is, has it reached the threshold for which it is directionally accurate? I would say it does for most domains.

Some examples of this mechanic working:

  1. Example where the generation accuracy is low but verification accuracy high: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms
  2. Example where generation accuracy is high and verification is low: data engineering
  3. Example where generation accuracy is low and verification is low: I don't know, maybe in relationship advice?
  4. Example where generation is high and verification is high: searching for synonyms of a word