Settings

Theme

Why do output tokens cost 5x more than input tokens?

anirudhsathiya.com

3 points by ani17 2 months ago · 2 comments

Reader

ani17OP 2 months ago

Author here. I wanted to understand what vLLM and llama.cpp are actually doing under the hood, but the codebases are massive. So I wrote a stripped down version from scratch to see the core ideas without the production complexity.

Code: https://github.com/Anirudh171202/WhiteLotus

lazyMonkey69 2 months ago

I think the paged attention part is a bit oversimplified. Nice read otherwise!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection