Inanely Fast Local AI: 775 token per second! 🤯 I was able to run the new DiffusionGemma (full BF16 model) by @googlegemma on vLLM (fork by Red Hat) on Nvidia RTX 6000 Pro. It's blazing fast at short contexts, but gets slow very quickly. At 100k, TTFT is 22s! ■ Leave a comment https://t.co/48SWaVTDK4

Inanely Fast Local AI: 775 token per second! 🤯 I was able to run the new DiffusionGemma (full BF16 model) by

on vLLM (fork by Red Hat) on Nvidia RTX 6000 Pro. It's blazing fast at short contexts, but gets slow very quickly. At 100k, TTFT is 22s! ■ Leave a comment if you want to know the setup and command to run the model.