Settings

Theme

Understanding Emergent Abilities of Language Models from the Loss Perspective

arxiv.org

6 points by maccaw 2 years ago · 1 comment

Reader

cosmojg 2 years ago

Does this mean that "overtraining" a midsize LLM for many more epochs on a small, representative subset of the dataset used by a larger, more performant LLM might be sufficient for matching the performance of the larger model?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection