Settings

Theme

Fastgen – SOTA LLM inference in 3k lines of Python

github.com

3 points by mpu 7 months ago · 1 comment

Reader

mpuOP 7 months ago

We just released a tiny (~3kloc) Python library that implements state-of-the-art inference algorithms on GPU and provides performance similar to vLLM. We believe it's a great learning vehicle for inference techniques and the code is quite easy to hack on!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection