No Train No Gain:Revisiting Efficient Training Algrthm for Transformer-BasedLM

11 points by froster 3 years ago · 1 comment

Reader

frosterOP 3 years ago

Recent paper highlights the difficulty of creating a new optimizer as drop-in replacement. Sophia and Lion were recently proposed as superior alternatives to Adam, but appeared worse in an independent eval

Settings

No Train No Gain:Revisiting Efficient Training Algrthm for Transformer-BasedLM

Keyboard Shortcuts