Settings

Theme

Serving LLM 24x Faster on the Cloud with VLLM and SkyPilot

blog.skypilot.co

12 points by zhwu 3 years ago · 1 comment

Reader

brucethemoose2 3 years ago

Another vLLM post... Its cool, but I still can't tell if its SOTA? Vanilla transformers LLaMA is not optimal at all, especially in the presence of quantized backends like exLlama, GPTQ, Llama.cpp, TVM Llama, and (I think) JAX Llama and Torch-MLIR Llama.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection