Seven Hours, Zero Internet, and Local AI Coding at 40k Feet
betweentheprompts.comI don't know what ollama uses behind the scenes, but there's also MLX for macs, which should be faster in general. Also there's a thing about top_k on gpt-oss which might need tweaking. Saw reports that setting it to 100 vs. default 0 brings an extra ~20t/s in generation speed.