Settings

Theme

Show HN: Aion-Torch – Adaptive residual scaling for deep Transformers

github.com

2 points by Rioverde a month ago · 0 comments · 1 min read

Reader

Hello HN, I’ve turned my Master’s research on stabilizing very deep Transformers into an open-source PyTorch library called AION-Torch. Instead of a fixed residual connection, it uses an adaptive residual that looks at how “energetic” the block’s input and output are and dials the residual strength up or down to keep things stable. On my small setup (RTX 4060) it seemed to help very deep Transformer stacks keep gradients under control and reach lower loss without special tuning.

The repo has a drop-in AionResidual module, some basic tooling to log what’s happening inside the network, and small examples to show how to plug it into existing models. I’d love feedback on whether this idea makes sense beyond toy setups, how you would benchmark it against standard residuals/DeepNorm on real tasks, and if the API feels natural for people who train larger models.

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection