“Local LLMs Are Finally Beating the Cloud!” — But Are They?

An attempt to bring objectivity to the endless local vs API vs subscription debate

Press enter or click to view image in full size

The AI community is buzzing again.

Ollama just added Anthropic-compatible API support. GLM-4.7 Flash dropped with impressive benchmarks. My Twitter/LinkedIn feed is full of takes like “subscriptions are dead” and “local models for the win.”

Press enter or click to view image in full size

I get it. The dream of running frontier-quality AI on your own hardware, free from monthly fees and rate limits, is compelling. I want it to be true.

But I’ve been skeptical.

Not Medium member? Or want to share it with one? Read here!

My Reality Check

Last week I tried running GLM-4.7 Flash on my MacBook Pro(M4, 48GB memory) — a machine that costs over $3,000.

The results?

It hung my machine during loading
Once running, it managed ~16 tokens/second on a single task
It consumed half my RAM