Published May 7, 2026 | Version v1
Preprint Open
- 1. Manhattan Metric
Description
Inspired by the role of sleep in biological continual learning, we introduce RVW, a trans-
former architecture for online continual adaptation of pretrained models. RVW maintains a
small pool of per-layer experts that grow and prune in response to distribution shift, with
no replay buffer and no explicit task identifier. Applied to TinyLlama-1.1B on a 15,000-
chunk six-domain stream, RVW reaches 40 average held-out PPL, substantially better than
EWC (158), fine-tuning (164), and LoRA (448) on the same parameter-matched base, while
preserving prior-domain performance. Threshold sweeps suggest a combinatorial encoding
reading: domain knowledge appears to be carried by routing patterns across layers rather
than by individual specialized experts.
Files
rvw.pdf
Files (531.1 kB)