Show HN: Llama.cpp Tutorial 2026: Run GGUF Models Locally on CPU and GPU

13 points by anju-kushwaha 15 days ago · 5 comments · 1 min read

Complete llama.cpp tutorial for 2026. Install, compile with CUDA/Metal, run GGUF models, tune all inference flags, use the API server, speculative decoding, and benchmark your hardware.

https://vucense.com/dev-corner/llama-cpp-tutorial-run-gguf-m...

CableNinja 15 days ago

Ive been trying to run local, effectively followed this guide (before the guide existed), and have not had any success. Llama builds fine, and then when i start it up, it just indefinitely spins its progress bar. I left it sit for 3 days and nada.

Running on an 8core 12gb ram vm, which has an amd rx5500xt (8gb) passed through. ROCm built, llama built with the correct flags.

What am i missing?

washadjeffmad 15 days ago

Logs to troubleshoot, for starters.

ksato1234 10 days ago

I was just looking for a comprehensive website like this. Thank you

goran-j 12 days ago

great tutorial

Settings

Show HN: Llama.cpp Tutorial 2026: Run GGUF Models Locally on CPU and GPU

Keyboard Shortcuts