Settings

Theme

Why Are Sinusoidal Functions Used for Position Encoding?

mfaizan.github.io

5 points by mfn 3 years ago · 1 comment

Reader

mfnOP 3 years ago

Sinusoidal positional embeddings have always seemed a bit mysterious - even more so since papers don't tend to delve much into the intuition behind them. For example, from Vaswani et al., 2017:

> That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from 2π to 10000 · 2π. We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, PE(pos+k) can be represented as a linear function of PE(pos).

Inspired largely by the RoFormer paper (https://arxiv.org/abs/2104.09864), I thought I'd write a post that dives a bit into how intuitive considerations around linearity and relative positions can lead to the idea of using sinusoidal functions to encode positions.

Would appreciate any thoughts or feedback!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection