Inanely Fast Local AI: 775 token per second! 🤯 I was able to run the new DiffusionGemma (full BF16 model) by
@googlegemmaon vLLM (fork by Red Hat) on Nvidia RTX 6000 Pro. It's blazing fast at short contexts, but gets slow very quickly. At 100k, TTFT is 22s! â– Leave a comment if you want to know the setup and command to run the model.