Richard Sutton – Father of Reinforced Learning thinks LLMs are a dead end
dwarkesh.comSutton's alternative to LLMs is RL obviously, I mean duh. He says an alternative theory for the foundation of intelligence is "sensation, action, reward", that animals do this throughout their lives and that intelligence is about figuring out what actions to take to increase the rewards.
Well I have a problem with that, with all respect to Richard Sutton who is one of the AI gods. I don't think his Skinnerian behaviourist paradigm is realistic, I don't think "sensation, action, reward" works in physical reality, in the real world: because in the real world there are situations where pursuing your goals does not increase your reward.
Here's an example of what I mean. Imagine the "reward" that an animal will get from not falling down a cliff and dying. If the animal falls down the cliff and dies, reward is probably negative (maybe even infinitely negative: it's game over, man). But if the animal doesn't fall down the cliff and die, what is the reward?
There's no reward. If there was any reward for not falling down a cliff and dying, then all animals would ever do would be to sit around not falling down cliffs and dying, and just increasing their reward for free. That wouldn't lead to the development of intelligence very fast.
You can try to argue that an animal will obtain a positive reward from just not dying, but that doesn't work: for RL to enforce some behaviour P, it is P that has to be rewarded, not just being alive, in general. Deep RL systems don't learn to play chess by refusing to play.
For RL to work, agents must constantly maximise their reward, not just increase it or just avoid it going negative-infinite. And you just cannot do that in the physical world because there are situations where doing the wrong thing kills you and doing the right thing does not increase your reward.
Digital RL agents can avoid this kind of zero-gains scenario because they can afford to act randomly until they hit a reward, so e.g. an RL chess player can afford to play at random until it figures out how to play. But that doesn't work in the real world, where acting at random has a very high chance of killing an animal. Imagine an animal that randomly jumps off cliffs: game over, man. In the real world if you chase reward without already knowing where it comes from, you better have a very large number of lives [1].
So reward is not all you need. There may be cases where animals use a reward system to guide their behaviours, just like there are cases where humans learn by imitation, but in the general case they don't. It doesn't work. RL doesn't work in the real world and it's not how animals developed intelligence.
__________________
[1] Support for the theory that all animals are descended from cats?