๐Ÿ“ข Excited to finally be releasing my NeurIPS 2024 submission! Is Chinchilla universal? No! We find that: 1. language model scaling laws depend on data complexity 2. gzip effectively predicts scaling properties from training data As compressibility ๐Ÿ“‰, data preference ๐Ÿ“ˆ. ๐Ÿงตโฌ‡๏ธ https://t.co/ZYYZ4hxDN2

1 min read Original article โ†—

Post

Post

user avatar

๐Ÿ“ข Excited to finally be releasing my NeurIPS 2024 submission! Is Chinchilla universal? No! We find that: 1. language model scaling laws depend on data complexity 2. gzip effectively predicts scaling properties from training data As compressibility ๐Ÿ“‰, data preference ๐Ÿ“ˆ. ๐Ÿงตโฌ‡๏ธ

user avatar

Say your training compute budget = ~1.5e13 FLOPs If your dataset has a gzip compressibility ratio of 0.14, you should *max out your param count* and skimp on dataset size But if your dataset is less compressible (gzip=0.61), *keep your model small* and train it on a ton of data

Don't miss what's happening

People on X are the first to know.