Settings

Theme

Show HN: Chinchilla Scaling Laws Are Not Universal

github.com

1 points by KhoomeiK 2 years ago · 0 comments · 1 min read

Reader

Hey HN! Chinchilla (DeepMind 2022) tells us that when we scale up our language model training, we should scale the parameters and data equally.

Over the last several months I've been hacking on a research project to determine if the optimal compute allocation (scaling law) for training an LLM is sensitive to training data complexity. I found that as data complexity increases, you need even more data than Chinchilla suggests!

I released the preprint just yesterday: https://arxiv.org/abs/2405.16684

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection