Atlas: Learning to Optimally Memorize the Context at Test Time

41 points by og_kalu 2 days ago

Damn thought this was about like school tests, which is something I did a lot, try to memorize everything just before the test. Guess I got baited into clicking LLM article again

cgearhart - 2 days ago

It seems like there’s been a lot of progress here, but it also seems like there’s an elephant in the room that RNNs will _always_ have worse memory than self-attention since the latter always has complete access to the full context. We pay for that in other ways, but it seems like the unstated hypothesis of RNNs is that we believe in the long run that RNNs will be “good enough” and their other performance benefits will eventually prevail. I’m not convinced that humanity will ever sink comparable resources into optimizing this family of models that has gone into Transformers to make them practical at the scale they run today.

arjvik - 2 days ago

The Titans papers and the Test-Time Training papers (https://arxiv.org/abs/2407.04620) both have the same premise - models should "learn" from their context rather than memorize them. Very promising direction!

curtisszmania - 2 days ago

[dead]