Train a LLM from Scratch github.com 3 points by linhns 2 days ago · 1 comment Reader PiP Save subtick 2 days ago Curious — how did you handle training stability early on? Was convergence an issue without heavy tuning?