Muon Is Scalable for LLM Training

5 points by renonce a year ago · 1 comment

Reader

yorwba a year ago

For people who want to know more about the Muon optimizer: https://kellerjordan.github.io/posts/muon/