I run a team of AI agents on a Mac I bought in 2022. They handle my Slack, run research, draft content, monitor infrastructure, and spawn sub-agents for compound tasks. The whole operation costs me $1.50 a month in electricity. Zero API fees, zero cloud provider. Just OpenClaw in a Docker container and a 30-billion parameter model running locally.
This is the full setup, including every config that matters, every error I hit, and the performance tuning that took me from 12 tokens per second to 49. (Full cost breakdown: $1.50 electricity vs. the $330/month I was paying before. Details at the end.)
What OpenClaw Actually Is
I first installed OpenClaw the week it launched in January 2026. By that point it had already crossed 100,000 GitHub stars. Peter Steinberger’s interview with Lex Fridman is worth the watch if you want the full backstory.
OpenClaw is an open-source AI agent framework. You give it access to tools (shell, browser, file system, messaging) and a model, and it acts autonomously. You can talk to it through Slack, a web UI, or the CLI. It supports sub-agents for compound tasks, custom skills, and model routing across multiple providers.
The key difference from Claude Code: OpenClaw runs in a loop. It has a heartbeat system that keeps agents alive and working even when you’re not at the keyboard. Claude Code (in its default interactive mode) waits for you to type something. OpenClaw doesn’t. That’s what drew me in. I wanted agents running overnight, picking up tasks from a queue, monitoring infrastructure, drafting reports while I slept. Right now my agents handle Slack triage (summarizing threads, flagging action items), run multi-source research with parallel sub-agents, draft first passes of content like this article, and monitor my infrastructure for drift. The compound task pattern is the most useful: I describe what I want, and OpenClaw spawns three sub-agents to tackle different angles simultaneously.
You can run it in Docker, on a VPS, on a Mac Mini under your desk, or an old PC with a GPU. Pair it with a local model and your cloud costs go to zero.
The 30,000+ exposed instances found by security researchers tell you two things: a lot of people are running this, and most of them didn’t read the security docs. I nearly joined that list myself (see Error #9 and the network isolation section above).
OpenClaw Setup: Installation and First Config
I started with a fresh Ubuntu system and followed the install docs on GitHub. The first thing you hit is missing build tools. apt-get install python3, gcc, make, and the usual suspects before the install script will complete. Node.js 18+ is required.
The first run drops you into a config wizard that asks about gateway mode, model selection, and channel setup. I skipped it and wrote the config files directly. The wizard is interactive, which doesn’t work if you’re setting up over SSH or scripting the deployment.
You need three things configured:
- A model provider. I started with OpenRouter for testing before moving everything to my on-prem server. Initially I pointed it at Opus, which worked perfectly and consumed credits at an alarming rate. Scaled down to Sonnet, then found MiniMax M2.1 and Kimi2.5 as workable cheap alternatives. I avoided the free models entirely because free endpoints on OpenRouter don’t reliably support tools, system prompts, and structured output, which OpenClaw requires. They also route through providers whose data retention policies vary. Defeats the purpose if you’re trying to own your stack.
- A messaging channel. I use Slack with Socket Mode. No public webhook endpoint needed, just a bot token and an app token. The non-obvious gotcha: under App Home, “Allow users to send messages from the messages tab” must be checked. It’s separate from OAuth scopes and nothing in the docs tells you this. Socket Mode connects fine without it. Messages just silently never arrive.
- Network isolation. I didn’t bother with the application-level security settings initially. Instead I firewalled everything off, set up Tailscale, restricted SSH to private keys only, and bound all services to localhost. If nothing is listening on a public port, most of the hardening guides are solving a problem you don’t have.
The config lives in three places:
| File | What it does |
|---|---|
| Environment file | API keys, tokens, gateway token |
| clawdbot.json | Model, plugins, channels |
| auth-profiles.json | Model provider authentication |
One warning: clawdbot config set says “Updated” but doesn’t always write to the file the gateway actually reads. Edit the JSON directly. I stopped using the CLI for config changes after the third time it silently did nothing.
Connecting LM Studio as a Local LLM Provider
I run LM Studio on a 2022 Mac Studio (M1 Max, 32GB). The model is Qwen3-Coder-30B-A3B, a mixture-of-experts architecture where 3 billion parameters are active per token but all 30 billion live in memory. In GGUF Q4_K_S quantization it’s about 17.5GB on disk.
LM Studio exposes an OpenAI-compatible API on localhost. OpenClaw connects to it like any other model provider. If the machine running OpenClaw and the machine running LM Studio are the same box, you just point at localhost and you’re done.
My setup is split across two machines. OpenClaw runs on a server, the Mac Studio runs the model. An SSH reverse tunnel connects them over Tailscale, which means you’ve got an encrypted tunnel inside an encrypted VPN. Belt and suspenders.
Server (OpenClaw, localhost:port)
↓
Tailscale mesh (encrypted)
↓
SSH tunnel (encrypted)
↓
Mac Studio (LM Studio :port)
The tunnel adds less than one millisecond of latency. The bottleneck is always model inference, never the network.
I use a macOS LaunchAgent to keep the tunnel alive. It reconnects automatically after network drops, sleep/wake, router reboots. No third-party tools needed.
Bonus: Residential IP
There’s an added bonus to running on-prem hardware at your house that I didn’t anticipate. Your home internet connection has a residential IP address. When your agents browse the web, fetch pages, or interact with APIs, they’re coming from an IP that looks like a normal person, not a datacenter.
Residential IPs don’t get automatically blocked or hit with CAPTCHAs the way cheap hosting providers do. I’ve had agents get blocked on a VPS and work fine through the Mac Studio on the same site, same request, same minute.
Config Gotchas for LM Studio + OpenClaw
This is where I lost the most time.
The two settings that burned me longest were both silent failures. The provider name in your config must be “openai,” not “lmstudio” or “local” or anything creative. OpenClaw’s auth resolution silently fails on unrecognized provider names. Nothing in the logs, nothing on screen. It just doesn’t connect. Same story with the API mode: it must be “openai-completions” (which calls /v1/chat/completions). The other option, “openai-responses,” calls /v1/responses, which hangs indefinitely on LM Studio. The naming suggests they’re interchangeable. They’re not. I read the OpenClaw source code to figure both of these out.
The auth profile format also isn’t documented. After guessing for two hours, I found it requires a v1 store format:
{
"version": 1,
"profiles": {
"openai:default": {
"type": "api_key",
"provider": "openai",
"key": "lm-studio"
}
}
}
The key value doesn’t matter for LM Studio (it doesn’t authenticate) but it can’t be empty or OpenClaw skips the provider.
Two more that bit me later: the context window defaults to 4,096 tokens, but OpenClaw’s system prompt alone is 17,000 tokens. You’ll get “Cannot truncate prompt” errors until you bump the context in LM Studio’s UI to at least 32,768. Set it in the UI, not the CLI. The CLI setting doesn’t survive a crash.
And the Jinja template bug: Qwen3-Coder’s GGUF template includes a | tojson | safe filter. LM Studio’s Jinja engine doesn’t support | safe. It only triggers with complex tool schemas (nested JSON in parameters), so you might run fine for days before hitting it. Fix: edit the template in LM Studio’s UI (My Models > Prompt Template), find the two occurrences of | tojson | safe, and change them to | tojson.
13 OpenClaw Errors and How I Fixed Them
Thirteen errors across the full setup. None of them had useful documentation when I searched. That’s why they’re all here.
| # | Error | Root Cause |
|---|---|---|
| 1 | “Unknown model” on OpenRouter | Model not in internal allowlist |
| 2 | Free models fail | Missing tool/structured output support |
| 3 | “Cannot truncate prompt” | 4K default context, 17K system prompt |
| 4 | Provider auth silent fail | Provider name must be “openai” |
| 5 | “openai-responses” hangs | Wrong API mode for LM Studio |
| 6 | “Unknown filter: safe” | Jinja template incompatibility |
| 7 | MLX crashes under load | Memory instability, switch to GGUF |
| 8 | Session history bloat | 1,836-line session fills context |
| 9 | Sub-agent token mismatch | Env/config tokens diverge on upgrade |
| 10 | “openclaw: command not found” | Binary not in PATH after source install |
| 11 | GATEWAY_BIND=lan breaks auth | Known issue #916 |
| 12 | bootstrapMaxChars crash | Config expects number, not object |
| 13 | Speculative decoding freeze | OOM at 120K+ context on 32GB |
1. “Unknown model” on OpenRouter
OpenClaw has an internal model registry. Models not in it get rejected before the request leaves the box. MiniMax M2.1 wasn’t listed. Fix: add it to agents.defaults.models (plural, not model singular) as an allowlist entry. The naming matters too. OpenRouter uses lowercase IDs (minimax/minimax-m2.1, not MiniMax-M2.1). Check openrouter.ai/models for exact strings.
2. Free models can’t handle the agent protocol
Llama 3.3 70B returned 404. Gemma 3 27B threw “Upstream error from OpenInference.” Free endpoints on OpenRouter don’t reliably support tools, system prompts, and structured output, which is everything OpenClaw needs. Save yourself the debugging: use paid models or go local.
3. “Cannot truncate prompt with n_keep >= n_ctx”
LM Studio defaults to 4,096 tokens of context. OpenClaw’s system prompt is 17,000 tokens. The math doesn’t work. Set context to at least 32,768 in LM Studio’s UI settings, not the CLI. The CLI setting doesn’t survive a crash.
4. Provider auth silently fails
If you name your provider “lmstudio” instead of “openai” in the config, OpenClaw’s auth resolution doesn’t error. It just doesn’t connect. Nothing in the logs, nothing on screen. I read the source code to find this.
5. “openai-responses” hangs indefinitely
openai-responses calls /v1/responses, which LM Studio doesn’t serve. openai-completions calls /v1/chat/completions, which it does. The naming suggests they’re interchangeable. They’re not.
6. Jinja template: “Unknown StringValue filter: safe”
Qwen3-Coder’s GGUF template uses | tojson | safe. LM Studio doesn’t support the safe filter. Only triggers with complex nested tool schemas, so you might run fine for days before hitting it. Fix: edit the template in LM Studio UI, remove | safe from both occurrences.
7. 30B MLX model crashes under sustained load
I tried the MLX 4-bit version of Qwen3-Coder first. It loaded fine, ran fine for short conversations, then started producing hallucinated gibberish followed by “Exit code: null.” No useful error in logs. Switched to GGUF Q4_K_S with sysctl tuning for the GPU memory cap and it’s been stable since.
8. Session history bloat
OpenClaw persists conversation history per session. After extended debugging, one session had 1,836 lines. New requests were failing because the history plus the system prompt exceeded the context window. Fix: delete stale session files from ~/.openclaw/agents/main/sessions/. Not obvious that this is a thing you need to do.
9. Sub-agent auth: the three-token problem
When sub-agents spawn, they connect back to the gateway via WebSocket. Three auth layers must agree:
- Device identity token
- Gateway paired device record
- Gateway auth token (stored in the config AND the env file AND gateway-token.txt)
After upgrading OpenClaw, the config auto-migrated with a new gateway token, but the env file kept the old one. Sub-agents read from env, the gateway validates against config. Every spawn failed with “device_token_mismatch.” Zero useful error messages.
Post-upgrade checklist I wish I’d had:
- chown -R clawdbot:clawdbot .git/ dist/ if git ops ran as root
10. “openclaw: command not found” from sub-agents
Source installs don’t add the binary to PATH. Sub-agents need it to announce back to the gateway. Fix: ln -sf /opt/clawdbot/openclaw.mjs /usr/local/bin/openclaw
11. GATEWAY_BIND=lan breaks sub-agents
Known issue (#916 on GitHub). Internal gateway calls don’t pass auth correctly when bound to the LAN interface. Change to loopback and sub-agents start working.
12. bootstrapMaxChars as an object crashes the gateway
The config expects a plain number (e.g., 8000). I passed it as an object with per-file settings. The gateway crashed on startup with no indication of which config key was wrong.
13. Speculative decoding froze the Mac twice
I tried enabling speculative decoding with a 0.75B draft model to speed up generation. At 120k context, the Mac Studio hard-froze. Power cycled, tried again at 140k. Froze again. On 32GB Apple Silicon with a 30B model near the context ceiling, there’s no memory headroom for a draft model. The display server gets killed first, which means no graceful shutdown. Just a black screen and a hard reboot.
Apple Silicon Performance Tuning: 12 to 49 Tokens Per Second
Out of the box, I was getting 12 tokens per second with frequent crashes. After tuning, 49 tokens per second at 140,000 tokens of context, stable for days.
KV Cache Quantization: The Single Biggest Win
The key-value cache stores attention state for every token in your context window. By default, LM Studio keeps it in F16 (16-bit floating point). Switching both K and V to Q8_0 (8-bit) nearly doubles your usable context and increases generation speed.
| Setting | Max Context | Gen Speed | Notes |
|---|---|---|---|
| F16 KV | 75,000 | 12-35 t/s | Default |
| Q8_0 KV | 140,000 | 49 t/s | Production config |
I tested every increment:
32k stable → 49k tight → 65k comfortable → 75k ceiling (F16) → 80k fails to load → 120k works (Q8_0) → 140k production ceiling (Q8_0) → 150k fails → 200k OOM kills the display server.
Set Flash Attention to explicit “On” in LM Studio, not “Auto.” Auto doesn’t always activate, and it’s required for KV cache quantization to work.
sysctl: Raise the GPU Memory Cap
macOS caps GPU memory at about 66% of unified RAM by default. On a 32GB machine, that’s roughly 21GB. A 17.5GB model plus KV cache at any reasonable context length blows right past that.
sudo sysctl iogpu.wired_limit_mb=24576
This raises the cap to 24GB. Persist it in /etc/sysctl.conf so it survives reboots. This was the difference between “crashes under load” and “stable at 140k context.”
CPU Threads: Less is More
Apple Silicon has performance cores and efficiency cores. The M1 Max has 8 P-cores and 2 E-cores. I assumed 10 threads would be faster than 8. It’s not. The E-cores are slower and create a bottleneck. Use P-cores only.
What Didn’t Help
I also tried batch size 1536 (vs 768), speculative decoding with a 0.75B draft model, and sub-4-bit quantization (Q3_K_M, IQ3_XS). Batch size made zero practical difference (48.5 vs 49.2 t/s). Speculative decoding froze the Mac twice at 120K+ context because there’s no memory headroom for a draft model on 32GB. And sub-4-bit quants are actually slower on Apple Silicon because of dequantization overhead. Q4_K_S is the sweet spot.
OpenClaw Config Optimization
The model is only half the story. OpenClaw’s defaults are designed for 200k-context cloud models. On local hardware, you need to trim.
The first thing I changed was bootstrapMaxChars, from 20,000 down to 8,000. OpenClaw loads project files into context at the start of every request. At 20k per file, a single bootstrap was consuming half my context window before the conversation even started.
Next was contextPruning (cache-ttl, 5 minutes). Old context that hasn’t been referenced gets dropped automatically. Before this, I was running out of context mid-conversation because stale tool outputs were taking up space.
Finally, historyLimit: 3. This caps how much Slack conversation history gets loaded. I had a busy channel filling the context with old messages, crowding out the actual work.
Production Config Summary
For anyone running a 30B MoE model on 32GB Apple Silicon:
| Setting | Value |
|---|---|
| Model | Qwen3-Coder-30B-A3B Q4_K_S |
| Context | 140,000 tokens |
| KV Cache (K and V) | Q8_0 |
| Flash Attention | On (explicit) |
| CPU Threads | 8 (P-cores only) |
| GPU Layers | All (49/49) |
| Batch Size | 768 |
| bootstrapMaxChars | 8000 |
| historyLimit | 3 |
| contextPruning | cache-ttl, 5m |
Generation speed: 49 tokens/second. Prompt eval: ~470 tokens/second. Three sub-agents spawning concurrently at 120k context, all completing successfully.
10 Things I’d Change on a Fresh Install
If I were starting over tomorrow, in order:
- Run
sysctl iogpu.wired_limit_mb=24576before loading any model. I was convinced the model was unstable when the real problem was macOS starving the GPU of memory. Persist it in /etc/sysctl.conf immediately. - Start with GGUF, not MLX. MLX has better integration with some tools but its KV cache quantization is buggy (fails above 1k tokens on 8-bit). GGUF’s Q8_0 KV cache just works and it’s the single biggest performance unlock.
- Set LM Studio’s default context in the UI on first launch. When the model crashes (and it will, while you’re testing limits), LM Studio reloads it with the default from settings.json. If that default is 4,096 and your system prompt is 17,000 tokens, you’re in a crash loop that looks like the model is broken when it’s actually a config problem.
- Read the OpenClaw source for auth-profiles.json. The format isn’t documented anywhere. I burned two hours guessing before reading the source.
- Symlink the openclaw binary to /usr/local/bin immediately after a source install. You won’t know sub-agents need it in PATH until they silently fail, and “silently” means a 60-second timeout with no error message.
- After every OpenClaw upgrade, verify the env file tokens match the config tokens. The config auto-migrates. The env file doesn’t.
- Set both K and V cache to Q8_0 from day one. I ran F16 initially because I didn’t know better. The switch doubled my context and increased speed. There’s no reason not to do it.
- Don’t try 200k context on 32GB. I know the model card says it supports it. It will OOM kill your display server and you’ll be reaching for the power button. 140k is the ceiling on 32GB with this model. Respect it.
- Clear session history periodically. OpenClaw doesn’t do this for you. Sessions accumulate conversation history that counts against your context window. I didn’t notice until a session had 1,836 lines and new requests were failing for no apparent reason.
- Test with complex tool schemas early. The Jinja template bug only triggers with nested JSON parameters. Simple requests work fine. You’ll think everything is stable, deploy to production, and then it breaks on the first real compound task.
Cost Breakdown
My Setup
| Item | Monthly Cost |
|---|---|
| Electricity (BC Hydro) | ~$1.50 |
| VPS (optional) | $0 – $24 |
| Cloud API | $0 |
| Total | $1.50 – $25.50 |
The Mac Studio draws about 11W idle, 60W under load. In practice it averages around 20W because agents run in bursts, not continuously. BC Hydro’s residential rate ($0.0996/kWh) on 20W average works out to about $1.50 a month. Your mileage varies by jurisdiction, but even in expensive markets you’re looking at $3-5.
The VPS is optional. If you run OpenClaw directly on the same machine as your model, it’s electricity only. I use a VPS for remote access and 24/7 uptime independent of my home network, but it’s not required.
What I Was Paying Before
| Service | Monthly Cost |
|---|---|
| Claude Code (Max) | $100 |
| ChatGPT Pro | $200 |
| OpenRouter credits | $20-50 |
| Misc API calls | $10-30 |
| Total | $330-380 |
Not all of that is replaced. I still use Claude Code for interactive work and ChatGPT for specific tasks. But the 24/7 autonomous agents, the batch jobs, the research tasks, the monitoring, the content drafts, all of that moved to OpenClaw on local hardware. The subscriptions that were funding autonomous work went to zero.
Cloud API Cost Comparison
These costs assume heavy autonomous agent usage (millions of tokens per month). Light usage would be significantly cheaper on cloud APIs:
| Setup | Monthly Cost |
|---|---|
| VPS + Anthropic Sonnet 4 | $124+ |
| VPS + Anthropic Sonnet 4.5 | $44-74 |
| VPS + OpenRouter MiniMax M2.1 | $29-39 |
| VPS + local model | $25.50 |
| Local only + local model | $1.50 |
The bottom line is the one nobody writes about because it requires owning hardware. But if you already have a Mac with 32GB sitting on a desk, you already own the most expensive part.
About the Author
Ian L. Paterson is CEO of Plurilock, a publicly traded cybersecurity company. This is part of a series documenting what it looks like to build AI-powered infrastructure for real work. Other posts cover persistent memory for Claude Code, session lifecycle management, and the daily automation layer that ties it all together.