Settings

Theme

Generating one token at a time is a blessing in disguise

kachkach.com

3 points by halflings a month ago · 1 comment

Reader

halflingsOP a month ago

LLMs generate their output one token at a time. The first thought when you learn this is that this is a huge performance bottleneck, as we are used to highly parallelized systems.

However, a large part of what makes LLMs feel so magical comes from this bottleneck.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection