Settings

Theme

Next Grok model training with 10T parameter model

twitter.com

3 points by ramshanker 25 days ago · 4 comments

Reader

lifecodes 25 days ago

I guess we are reaching the point where “10T parsmeters” sounds more like a marketing number than a meaningful metric.

Between moE, aggressive quantization, and synthetic data pipelines, it’s getting harder to tell whether bigger models are actually better, or just more expensive to train.

Would be more interesting to see -> capability per dollar or per watt, not parameter count...

bfeynman 25 days ago

Isn't what the leading labs are currently chasing after is not pretraining and massive parameters but enriched and deep fine tuning and post training for agentic tasks/coding? MoE with just new post training paradigms lets smaller models perform quite well, and much more pragmatic to scale inference with. Given that, this choice seems super odd, as the frontier labs seem to stay neck and neck, and I don't even see Grok being used in any benchmarks because of how poorly it performs

ramshankerOP 25 days ago

This is the best publically posted model size, ever since top AI labs started treating model size as a trade secret. This should also guide next generation of inference ASICs.

carolien 24 days ago

Sounds more them marketing number. Carolien eutrucking

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection