Lost in Backpropagation: The LM Head Is a Gradient Bottleneck arxiv.org 4 points by famouswaffles 4 days ago · 0 comments Reader PiP Save No comments yet.