Inanely Fast Local AI: 775 token per second! 🤯 I was able to run the new DiffusionGemma (full BF16 model) by @googlegemma on vLLM (fork by Red Hat) on Nvidia RTX 6000 Pro. It's blazing fast at short contexts, but gets slow very quickly. At 100k, TTFT is 22s! ■ Leave a comment https://t.co/48SWaVTDK4

1 min read Original article ↗

Inanely Fast Local AI: 775 token per second! 🤯 I was able to run the new DiffusionGemma (full BF16 model) by

@googlegemma

on vLLM (fork by Red Hat) on Nvidia RTX 6000 Pro. It's blazing fast at short contexts, but gets slow very quickly. At 100k, TTFT is 22s! â–  Leave a comment if you want to know the setup and command to run the model.