Embarrassingly Simple Self-Distillation Improves Code Generation
arxiv.orgAhhh! section A uses the classic good old gumble max trick[1]
[1] - https://darshanmakwana412.github.io/2026/01/gumbel-max-trick...
Nicely done! I didn't catch if you were cited.
Haha, fair point! I was just sharing it in case someone wanted a quick read : )