v0.8.2: blind-first unified pipeline · compact MCP
Give your AI agent
eyes and hands
on your desktop
Any tool-calling model. Any OS. 74 granular tools or 6 compact compound tools (Anthropic Computer-Use style). Blind-first pipeline tries accessibility trees before pixels — ~12× cheaper per turn than vision-only agents.
Use it your way
No app integrations. No API keys per service.
If it's on your screen, your AI can use it.
Tell it what to do
Describe what you want. clawdcursor executes it.
clawdcursor doctor clawdcursor start
Connect your AI directly
Add clawdcursor to get desktop control as a native tool.
Claude Code Cursor Windsurf Zed
Setup
Three ways to connect
One server. Three modes. Same desktop access.
MCP (Claude Code / Cursor / Windsurf)
- Run
clawdcursor consent - Add JSON block to your AI client's config
- Desktop tools appear natively
CLI Agent
- Run
clawdcursor doctor - Run
clawdcursor start - Type tasks in plain English
REST API
- Run
clawdcursor serve - 74 tools on localhost:3847
or 6 via?mode=compact - OpenAI function-calling format
Unified Pipeline · blind-first by default
Tries a11y first. Vision is the fallback.
One loop, three modes. The pipeline picks the cheapest path that works: router → blind agent → hybrid → vision. Most tasks never touch a screenshot.
1 Router
Free · zero LLM
Regex shortcuts + knowledge guides for trivial tasks. "Open Safari", "navigate to github.com" resolves in under a second. Everything else falls through.
2 Blind / Hybrid / Vision Agent
One loop · 3 strategy modes
Blind tries the a11y tree first (cheapest). Hybrid allows on-demand screenshots. Vision is the fallback when the UI is custom-canvas. Native tool_use for Anthropic / tool_calls for OpenAI / prose-JSON for everyone else.
3 Safety + Verification
Single chokepoint
Every tool call routes through the SafetyLayer. Destructive verbs (send, delete, close_window) escalate to confirm. Stagnation detection forces the agent to try a different approach or give_up.
🎯
Compact MCP surface
6 compound tools — computer, accessibility, window, system, browser, task — that collapse the 74 granular primitives into Anthropic-Computer-Use shape. ~12× smaller tool catalog. clawdcursor mcp --compact or ?mode=compact.
🧩
PlatformAdapter
One interface, one file per OS: macos.ts, windows.ts, linux.ts. Replaces 142+ scattered if (IS_MAC) branches. Linux covers both X11 (nut-js) and Wayland (ydotool/wtype) with AT-SPI a11y.
One unified pipeline.
No more --v2 vs legacy split. The 0.8.0 V2 agent and legacy cascade merged into a single blind-first loop. Existing clawdcursor start users get the unified pipeline automatically; the internal decision-maker is now the same whether you use REST, MCP, or the built-in autonomous agent.
Features
OS-agnostic. Model-agnostic.
Business logic never sees process.platform. Any tool-calling LLM, any desktop.
🍎
macOS — TCC-safe
System Events keyboard, nut-js mouse. clawdcursor grant walks you through Accessibility + Screen Recording.
🪟
Windows — UIA native
PowerShell + .NET UI Automation, Windows.Media.Ocr. x64 and ARM64.
🐧
Linux — AT-SPI + Tesseract
Same 74 tools (or 6 compact), same MCP interface. X11 + Wayland supported. sudo apt install tesseract-ocr python3-gi gir1.2-atspi-2.0 for full a11y; ydotool or wtype on Wayland for input.
🖱️
Smart Tools
Click by name, type by label, read text. Accessibility + OCR fallback — no screenshots needed.
⌨️
Shortcuts Engine
shortcuts_list and shortcuts_execute. Platform-aware: Cmd on macOS, Ctrl elsewhere. Zero LLM cost.
📖
App Guides (86+ apps)
Community JSON guides with shortcuts and tips. clawdcursor guides install excel
Get Started
Two commands.
Install, start. Providers auto-detected.
powershell -c "irm https://clawdcursor.com/install.ps1 | iex"
clawdcursor start --v2 # opt into vision-first v2 pipeline
curl -fsSL https://clawdcursor.com/install.sh | bash
clawdcursor grant # Accessibility + Screen Recording
clawdcursor start --v2
curl -fsSL https://clawdcursor.com/install.sh | bash
clawdcursor start --v2
Node.js 20+. Localhost only. Drop --v2 to run the legacy pipeline.