Settings

Theme

Hardware LLM at 16K Tokens/s

taalas.com

2 points by gcollard- 4 months ago · 1 comment

Reader

gcollard-OP 4 months ago

Testing this hardware LLM (LLAMA 3.1 8B on a chip) I get ~16k tokens per second.

With frontier models plateauing, I’ve been convinced AI will end up like bitcoin mining, and that NVIDIA’s general-purpose GPUs will be replaced by model-specific chips.

Glad to see someone innovating in this space.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection