A Discord Community Beat Meta’s LLaMA. The Secret? An Architecture From 1960

2 min read Original article ↗

RWKV-7 scores 72.8% vs LLaMA’s 69.7% with 3x fewer tokens. It runs in constant memory. Microsoft shipped it to 1.5B machines.

Delanoe Pirard

Press enter or click to view image in full size

RWKV-7 “Goose” — the RNN architecture that outlasted Transformers, seven years after everyone declared them dead.

In 1932, a book published by Cambridge University Press described what happens when you ask humans to recall stories they read weeks or months earlier. The details they remembered were systematically distorted. Not randomly, but coherently. They had compressed the past into a working summary, and inference did the rest. The book’s author, Frederic Bartlett, called this reconstructive memory: the brain doesn’t store a transcript of experience. It maintains a compressed state and rebuilds when asked.

This is, in essence, what a recurrent neural network does. It reads a sequence token by token, maintains a fixed-size hidden state (a compressed representation of everything it has seen) and updates it at each step. The state is small. The inference cost is constant. The memory doesn’t grow.

When the Transformer arrived in 2017 with “Attention Is All You Need”, it didn’t replace the RNN by being smarter about memory. It discarded memory entirely. Instead of compressing history, the Transformer stores all of it: every key, every value, for every token, in what is…