Settings

Theme

Knowing Enough About MoE to Explain Dropped Tokens in GPT-4

152334h.github.io

3 points by 152334H 2 years ago · 1 comment

Reader

turtleyacht 2 years ago

In AI/ML, Mixture of Experts (MoE).

"GPT-4 uses a simple top-2 Token Choice router for MLP MoE layers. It does not use MoE for attention."

GPT won't fix, since "tokens being dropped are generally good for the performance of MoE models."

https://152334h.github.io/blog/knowing-enough-about-moe/#con...

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection