Train a LLM from Scratch

3 points by linhns 2 months ago · 1 comment

Reader

subtick 2 months ago

Curious — how did you handle training stability early on? Was convergence an issue without heavy tuning?

Settings