Press enter or click to view image in full size
The AI community is buzzing again.
Ollama just added Anthropic-compatible API support. GLM-4.7 Flash dropped with impressive benchmarks. My Twitter/LinkedIn feed is full of takes like “subscriptions are dead” and “local models for the win.”
Press enter or click to view image in full size
I get it. The dream of running frontier-quality AI on your own hardware, free from monthly fees and rate limits, is compelling. I want it to be true.
But I’ve been skeptical.
Not Medium member? Or want to share it with one? Read here!
My Reality Check
Last week I tried running GLM-4.7 Flash on my MacBook Pro(M4, 48GB memory) — a machine that costs over $3,000.
The results?
- It hung my machine during loading
- Once running, it managed ~16 tokens/second on a single task
- It consumed half my RAM