Settings

Theme

Reduce cache memory on avg 39x

twitter.com

2 points by jonathanehrlich 7 months ago · 1 comment

Reader

jonathanehrlichOP 7 months ago

What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x (enabling 26x higher tok/s and lower TTFT) while maintaining quality. These smaller KV caches, which we call cartridges, can be trained once and reused for different user requests!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection