screencommander is a macOS 14+ CLI that drives a terminal-first observe -> decide -> act automation loop for agents to operate the desktop through computer use:
- Observe with Retina-aware screenshot capture
- Decide with deterministic coordinate mapping from metadata
- Act with global mouse and keyboard event synthesis
This is also compatible with non-vision model workflows (for example codex-5.3-codex-spark) by relying on coordinate + metadata control rather than in-model screenshot understanding.
Requirements
- macOS 14.0+
- Xcode Command Line Tools (for
swift)
Build
Fast Capability Guide
For a concise, operations-first guide to screencommander capabilities and reliable command patterns, see:
SKILL.md
This is the same skill/runbook AGENTS use to quickly understand and operate the CLI.
Install
Use the reusable installer script:
scripts/install.sh --prefix /usr/local
If you prefer a user-local install without sudo:
scripts/install.sh --prefix "$HOME/.local"The binary is installed to <prefix>/bin/screencommander.
For user-local installs, ensure ~/.local/bin is on your PATH.
Permissions
screencommander requires macOS privacy permissions:
- Screen Recording permission for
screenshot
- System Settings path: Privacy & Security > Screen Recording
- Deeplink:
x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture
- Accessibility permission for
click,type, andkey
- System Settings path: Privacy & Security > Accessibility
- Deeplink:
x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility
On denial, commands fail fast with explicit remediation text and stable exit codes.
Commands
Doctor
screencommander doctor screencommander doctor --json
Behavior:
- Reports permission status with traffic lights (
🟢granted,🔴denied). - Reports active displays with IDs and bounds in points.
- Exits
0even if permissions are missing; use output for remediation.
Screenshot
screencommander screenshot \ --display main \ --out ~/Library/Caches/screencommander/captures/desk.png \ --format png \ --meta ~/Library/Caches/screencommander/captures/desk.json \ --cursor
JSON output variant:
screencommander screenshot --json
Behavior:
- Captures with ScreenCaptureKit
SCScreenshotManager. - Writes image and JSON metadata.
- Default image path:
~/Library/Caches/screencommander/captures/<timestamp>.png - Default metadata path:
<image>.json(for example~/Library/Caches/screencommander/captures/<timestamp>.json) - Also updates managed
~/Library/Caches/screencommander/last-screenshot.jsonby default.
Click
screencommander click 640 320 --space pixels
Using explicit metadata and double-right-click:
screencommander click 0.25 0.25 \ --space normalized \ --meta ./captures/desk.json \ --button right \ --double
Behavior:
- Defaults to metadata path
~/Library/Caches/screencommander/last-screenshot.json. - Maps screenshot coordinates into global Quartz coordinates deterministically.
- Captures pre-action and post-action screenshots by default and prints both paths.
- Disable before/after capture with
--no-postshot.
Type
screencommander type "hello world"
With per-character delay and JSON output:
screencommander type "delayed text" --mode unicode --delay-ms 50 --json
Behavior:
- Defaults to paste mode (
cmd+v) for reliable full-text input. - Captures pre-action and post-action screenshots by default (
--no-postshotto disable).
Key
screencommander key "enter" screencommander key "cmd+shift+4" screencommander key "option+tab" --json screencommander key "spotlight" screencommander key "missioncontrol" screencommander key "launchpad"
Supported modifier aliases include cmd|command, opt|option|alt, and ctrl|control.
System/media keys are also supported (for example volumeup, volumedown, brightnessup, mute, launchpad, play, next, prev).
spotlight and raycast map to cmd+space; missioncontrol maps to f3.
Behavior:
- Captures pre-action and post-action screenshots by default (
--no-postshotto disable).
Keys
screencommander keys "press:cmd+tab" "press:cmd+tab" screencommander keys "press:next" "sleep:100" "press:prev"
keys executes down/up/press/sleep steps in strict order.
For repeated modifier-based shortcuts, include modifiers explicitly in each press step (for example press:cmd+tab). Standalone keys such as press:next and press:prev do not require modifiers.
Cleanup
screencommander cleanup --older-than-hours 24
Prunes managed capture artifacts (png, jpg, jpeg, json) in ~/Library/Caches/screencommander/captures older than the configured age.
Sequence
Run an ordered bundle of actions from JSON:
screencommander sequence --file ./sequence.json
Example sequence.json:
{
"steps": [
{ "click": { "x": 935, "y": 1074, "meta": "./last-screenshot.json" } },
{ "type": { "text": "hello from sequence", "mode": "paste" } },
{ "key": { "chord": "enter" } }
]
}Behavior:
- Executes steps in order.
- Captures pre-action and post-action screenshots around each step by default.
- Disable per-step before/after capture with
--no-postshot.
Scripting (JSON output)
For automation and scripts, the CLI can emit exactly one JSON object to stdout (success or error). Use this to parse results without scraping human output.
- Per-command: Add
--jsonto any command (e.g.screencommander doctor --json). - Global: Use
--output jsonbefore the subcommand, or setSCREENCOMMANDER_OUTPUT=jsonso every command defaults to JSON. - One-line output: Use
--compact(orSCREENCOMMANDER_JSON_COMPACT=1) when output is JSON for smaller, faster-to-parse output. - Precedence: Per-command
--jsonoverrides root--outputover env over default (human).
Example:
# Single command screencommander doctor --json # Global JSON for the run screencommander --output json screenshot --out /tmp/cap.png # Env var for whole script export SCREENCOMMANDER_OUTPUT=json screencommander doctor screencommander cleanup --json --compact
With JSON mode, stdout is exactly one JSON object: either a success envelope ("status": "ok", result, optional exitCode) or an error envelope ("status": "error", error.code, error.message, exitCode). Scripts can read stdout once and branch on status. For the full contract (envelope fields and per-command result shapes), see docs/json-output-schema.md.
For maximum speed in scripts, combine --json --compact --no-postshot (and optionally --output json or the env var) so action commands skip before/after screenshots and emit one-line JSON.
Metadata Schema
{
"capturedAtISO8601": "2026-02-20T12:34:56.789Z",
"displayID": 69733248,
"displayBoundsPoints": { "x": 0, "y": 0, "w": 1512, "h": 982 },
"imageSizePixels": { "w": 3024, "h": 1964 },
"pointPixelScale": 2,
"imagePath": "/absolute/path/to/Screenshot-20260220-123456.png"
}Exit Codes
10: screen recording permission denied11: accessibility permission denied20: capture failed21: image write failed30: metadata read/write failed40: invalid coordinate41: mapping failed50: input synthesis failed60: invalid arguments or chord parse