Ask HN: Why are LLM's made intentionally non-deterministic?
Just found out that the main factor that introduces non-determinism in LLM's answers is the so called temperature that adds variation ("creativity") into selecting the winning token each time.
This might be good for creative tasks, but may as well become an obstacle in adopting LLM's for certain other tasks. Among other things, it makes LLM-based systems untestable.
Program code is deterministic for a reason (if you forget about physical defects and cosmic particles occasionally hitting microchips and flipping some bits). If your code is supposed to control a nuclear power plant you better have a testable system with proven correctness.
So why aren't the users of LLM's given some sort of a fader to control temperature with an option to have 100% determinism? Who decided that we shouldn't have control over this parameter and why? In fact, nondeterminism goes all the way down to the training level where every token and weight starts as a random number, a concept I wrote about in this intro to how AI works article [0]. The reason for this is that randomness is crucial to getting emergent data out of a system. Those are unexpected, unpredictable, but often useful results. This is how an LLM can answer a question it's never been asked before. We have had deterministic databases forever so there would be no AI advance if LLMs were just databases. The AI models of the 1960's tried that very approach, called rules based and it doesn't work. We can never come up with or write down all the rules. The failure of those methods lead to the "AI winter" and no further real progress in AI until the invention of transformers at Google. [0] https://levelup.gitconnected.com/something-from-nothing-d755... unfortunately disabling temperature / switching to greedy sampling doesn't necessarily make most LLM inference engines _fully_ deterministic as parallelism and batching can result in floating point error accumulating differently from run to run - it's possible to make them deterministic but does come with a perf hit some providers _do_ let you set the temperature, including to "zero", but most will not take the perf hit to offer true determinism Among other reasons, if you turn temperature down to 0, llms stop working. Like they don't give natural language answers to natural language questions any more, they just halt immediately. Temperature gives the model wiggle room to emit something plausible sounding rather than clam up when presented an input that wasn't verbatim in the training data (such as the system prompt). Yes but that doesn't explain why we aren't given a choice. Program code is boringly deterministic but in many cases it's exactly what you need while non-determinism becomes your dangerous enemy (like in the case of some Airbus jets being susceptible to bit flips under cosmic rays) The current way to address this is through RAG applications or Retrieval Augmented Generation. This means using the LLM side for the natural language non-deterministic portion and using traditional code and databases and files for the deterministic part. A good example is bank software where you can ask what your balance is and get back the real number. A RAG app won't "make up" your balance or even consult the training the find it. Instead, the traditional code (deterministic) operations are done separately from the LLM calls.