Commands | agent-browser

Core#

agent-browser open                    # Launch browser (no nav); stays on about:blank
agent-browser open <url>              # Launch + navigate (aliases: goto, navigate)
agent-browser click <sel>             # Click element (--new-tab to open in new tab)
agent-browser dblclick <sel>          # Double-click
agent-browser fill <sel> <text>       # Clear and fill
agent-browser type <sel> <text>       # Type into element
agent-browser press <key>             # Press key (Enter, Tab, Control+a) (alias: key)
agent-browser keyboard type <text>    # Type at current focus (no selector needed)
agent-browser keyboard inserttext <text>  # Insert text without key events
agent-browser keydown <key>           # Hold key down
agent-browser keyup <key>             # Release key
agent-browser hover <sel>             # Hover element
agent-browser focus <sel>             # Focus element
agent-browser select <sel> <val>      # Select dropdown option
agent-browser check <sel>             # Check checkbox
agent-browser uncheck <sel>           # Uncheck checkbox
agent-browser scroll <dir> [px]       # Scroll (up/down/left/right, --selector <sel>)
agent-browser scrollintoview <sel>    # Scroll element into view
agent-browser drag <src> <dst>        # Drag and drop
agent-browser upload <sel> <files>    # Upload files
agent-browser screenshot [path]       # Screenshot (--full for full page)
agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
agent-browser screenshot --screenshot-dir ./shots    # Save to custom directory
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf <path>              # Save page as PDF
agent-browser snapshot                # Accessibility tree with refs
agent-browser eval <js>               # Run JavaScript
agent-browser connect port|url      # Connect to browser via CDP
agent-browser stream enable [--port <port>]  # Start runtime WebSocket streaming
agent-browser stream status           # Show runtime streaming state and bound port
agent-browser stream disable          # Stop runtime WebSocket streaming
agent-browser close                   # Close browser (aliases: quit, exit)
agent-browser close --all             # Close all active sessions

Get info#

agent-browser get text <sel>          # Get text content
agent-browser get html <sel>          # Get innerHTML
agent-browser get value <sel>         # Get input value
agent-browser get attr <sel> <attr>   # Get attribute
agent-browser get title               # Get page title
agent-browser get url                 # Get current URL
agent-browser get cdp-url             # Get CDP WebSocket URL
agent-browser get count <sel>         # Count matching elements
agent-browser get box <sel>           # Get bounding box
agent-browser get styles <sel>        # Get computed styles

Check state#

agent-browser is visible <sel>        # Check if visible
agent-browser is enabled <sel>        # Check if enabled
agent-browser is checked <sel>        # Check if checked

Find elements#

Semantic locators with actions (click, fill, type, hover, focus, check, uncheck, text):

agent-browser find role <role> <action> [value]
agent-browser find text <text> <action>
agent-browser find label <label> <action> [value]
agent-browser find placeholder <ph> <action> [value]
agent-browser find alt <text> <action>
agent-browser find title <text> <action>
agent-browser find testid <id> <action> [value]
agent-browser find first <sel> <action> [value]
agent-browser find last <sel> <action> [value]
agent-browser find nth <n> <sel> <action> [value]

Options:

--name <name> -- filter role by accessible name
--exact -- require exact text match

Examples:

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find alt "Logo" click
agent-browser find first ".item" click
agent-browser find last ".item" text
agent-browser find nth 2 ".card" hover

Wait#

agent-browser wait <selector>         # Wait for element
agent-browser wait <ms>               # Wait for time
agent-browser wait --text "Welcome"   # Wait for text (substring match)
agent-browser wait --url "**/dash"    # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "condition"   # Wait for JS condition
agent-browser wait --download [path]  # Wait for download
agent-browser wait --fn "!document.body.innerText.includes('Loading...')"  # Wait for text to disappear
agent-browser wait "#spinner" --state hidden           # Wait for element to disappear

Downloads#

agent-browser download <sel> <path>   # Click element to trigger download
agent-browser wait --download [path]  # Wait for any download to complete

Use --download-path <dir> (or AGENT_BROWSER_DOWNLOAD_PATH env) to set a default download directory. Without it, downloads go to a temporary directory that is deleted when the browser closes.

Mouse#

agent-browser mouse move <x> <y>      # Move mouse
agent-browser mouse down [button]     # Press button
agent-browser mouse up [button]       # Release button
agent-browser mouse wheel <dy> [dx]   # Scroll wheel

Clipboard#

agent-browser clipboard read                      # Read text from clipboard
agent-browser clipboard write "Hello, World!"     # Write text to clipboard
agent-browser clipboard copy                      # Copy current selection (Ctrl+C)
agent-browser clipboard paste                     # Paste from clipboard (Ctrl+V)

Settings#

agent-browser set viewport <w> <h> [scale]  # Set viewport size (scale for retina, e.g. 2)
agent-browser set device <name>       # Emulate device ("iPhone 14")
agent-browser set geo <lat> <lng>     # Set geolocation
agent-browser set offline [on|off]    # Toggle offline mode
agent-browser set headers <json>      # Extra HTTP headers
agent-browser set credentials <u> <p> # HTTP basic auth
agent-browser set media [dark|light]  # Emulate color scheme (persists for session)

Use --color-scheme for persistent dark/light mode across all commands:

agent-browser --color-scheme dark open https://example.com

Cookies & storage#

agent-browser cookies                 # Get all cookies
agent-browser cookies set <name> <val> # Set cookie
agent-browser cookies clear           # Clear cookies

agent-browser storage local           # Get all localStorage
agent-browser storage local <key>     # Get specific key
agent-browser storage local set <k> <v>  # Set value
agent-browser storage local clear     # Clear all

agent-browser storage session         # Same for sessionStorage

Network#

agent-browser network route <url>              # Intercept requests
agent-browser network route <url> --abort      # Block requests
agent-browser network route <url> --body <json>  # Mock response
agent-browser network route '*' --abort --resource-type script  # Block scripts only
agent-browser network unroute [url]            # Remove routes
agent-browser network requests                 # View tracked requests
agent-browser network requests --clear         # Clear request log
agent-browser network requests --filter <pat>  # Filter by URL pattern
agent-browser network requests --type xhr,fetch  # Filter by resource type
agent-browser network requests --method POST   # Filter by HTTP method
agent-browser network requests --status 2xx    # Filter by status (200, 2xx, 400-499)
agent-browser network request <requestId>      # View full request/response detail
agent-browser network har start                # Start HAR recording
agent-browser network har stop [output.har]    # Stop and save HAR (temp path if omitted)

Tabs & frames#

agent-browser tab                              # List tabs (each row shows tabId and label)
agent-browser tab new [url]                    # New tab
agent-browser tab new --label docs [url]       # New tab with a user-assigned label
agent-browser tab <t<N>|label>                 # Switch to a tab by id or label
agent-browser tab close [t<N>|label]           # Close a tab (defaults to active)
agent-browser window new                       # Open new browser window
agent-browser frame <sel>                      # Switch to iframe by CSS selector
agent-browser frame @e3                        # Switch to iframe by element ref
agent-browser frame main                       # Back to main frame

Stable tab ids and labels#

Tab ids are stable strings of the form t1, t2, t3. They're never reused within a session, so t2 keeps pointing at the same tab even as other tabs are opened or closed. The t prefix mirrors the @e1 element-ref convention and is not interchangeable with positional integers — agent-browser tab 2 errors with a teaching message; use t2.

You can also assign a memorable label (docs, app, admin) at tab-creation time and use it anywhere an id is accepted:

agent-browser tab new --label docs https://docs.example.com
agent-browser tab docs          # switch to the docs tab
agent-browser snapshot          # populate refs for docs
agent-browser click @e3         # click uses docs's refs
agent-browser tab close docs    # close by label

Labels are never auto-generated and never rewritten on navigation — an agent that names a tab docs keeps that name until the tab is closed. Labels are unique within a session; creating a second tab with an existing label errors.

Refs (@e1, etc.) are scoped to the tab that was active when the snapshot ran, so switch tabs first, then snapshot and interact:

agent-browser tab docs          # switch first
agent-browser snapshot          # refs for docs
agent-browser click @e3         # uses docs's refs

Iframe support#

Iframes are detected automatically during snapshots. Iframe nodes are resolved and their content is inlined beneath the iframe element in the snapshot output. Refs assigned to elements inside iframes carry frame context, so click, fill, and other interactions work without manually switching frames.

agent-browser snapshot -i
# @e3 [Iframe] "payment-frame"
#   @e4 [input] "Card number"
#   @e5 [button] "Pay"

# Interact directly using refs — no frame switch needed
agent-browser fill @e4 "4111111111111111"
agent-browser click @e5

# Or switch frame context for scoped snapshots
agent-browser frame @e3
agent-browser snapshot -i             # Only elements inside that iframe
agent-browser frame main              # Return to main frame

The frame command accepts element refs (@e3), CSS selectors ("#my-iframe"), or frame name/URL.

Dialogs#

agent-browser dialog accept [text]    # Accept dialog (with optional prompt text)
agent-browser dialog dismiss          # Dismiss dialog
agent-browser dialog status           # Check if a dialog is currently open

By default, alert and beforeunload dialogs are automatically accepted so they never block the agent. confirm and prompt dialogs still require explicit handling. Use --no-auto-dialog (or AGENT_BROWSER_NO_AUTO_DIALOG=1) to disable automatic handling.

When a JavaScript dialog (alert, confirm, prompt) is pending, all command responses include a warning field with the dialog type and message.

Streaming#

agent-browser stream enable           # Start runtime WebSocket streaming on an auto-selected port
agent-browser stream enable --port 9223  # Bind a specific localhost port
agent-browser stream status           # Show enabled state, port, browser connection, screencasting
agent-browser stream disable          # Stop runtime streaming and remove the .stream metadata file

Streaming is enabled automatically for all sessions. Use these commands to check status, re-enable on a specific port, or disable streaming.

Debug#

agent-browser trace start [path]      # Start trace
agent-browser trace stop [path]       # Stop and save trace
agent-browser profiler start          # Start Chrome DevTools profiling
agent-browser profiler stop [path]    # Stop and save profile (.json)
agent-browser record start <path>     # Start video recording (WebM)
agent-browser record stop             # Stop and save video
agent-browser record restart <path>   # Stop current and start new recording
agent-browser console                 # View console messages
agent-browser console --json          # JSON output with raw CDP args
agent-browser console --clear         # Clear console log
agent-browser errors                  # View page errors
agent-browser errors --clear          # Clear error log
agent-browser highlight <sel>         # Highlight element
agent-browser inspect                 # Open Chrome DevTools for the active page

Auth vault#

agent-browser auth save <name> [opts]    # Save auth profile
agent-browser auth login <name>          # Login using saved credentials
agent-browser auth list                  # List saved profiles (names and URLs only)
agent-browser auth show <name>           # Show profile metadata (no passwords)
agent-browser auth delete <name>         # Delete a saved profile

Save options:

--url <url> -- login page URL (required)
--username <user> -- username (required)
--password <pass> -- password (required unless --password-stdin)
--password-stdin -- read password from stdin (recommended to avoid shell history exposure)
--username-selector <sel> -- custom CSS selector for username field
--password-selector <sel> -- custom CSS selector for password field
--submit-selector <sel> -- custom CSS selector for submit button

auth login navigates with load and then waits for the username/password/submit selectors to appear before interacting. This improves reliability on SPA login pages where fields render after initial page load.

echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
agent-browser auth login github
agent-browser auth list

Confirmation#

When --confirm-actions is set, certain action categories return a confirmation_required response instead of executing immediately. Use confirm or deny to approve or reject the action.

agent-browser confirm <confirmation-id>  # Approve a pending action
agent-browser deny <confirmation-id>     # Deny a pending action

Pending confirmations auto-deny after 60 seconds.

agent-browser --confirm-actions eval,download eval "document.title"
# Returns confirmation_required with ID
agent-browser confirm c_8f3a1234

State management#

agent-browser state save <path>       # Save auth state to file
agent-browser state load <path>       # Load auth state from file
agent-browser state list              # List saved state files
agent-browser state show <file>       # Show state summary
agent-browser state rename <old> <new> # Rename state file
agent-browser state clear [name]      # Clear states for session name
agent-browser state clear --all       # Clear all saved states
agent-browser state clean --older-than <days>  # Delete old states

Sessions#

agent-browser session                 # Show current session name
agent-browser session list            # List active sessions

Chrome profiles#

agent-browser profiles               # List available Chrome profiles
agent-browser profiles --json        # List profiles as JSON
agent-browser --profile Default open https://gmail.com  # Reuse a profile's login state

Dashboard#

agent-browser dashboard [start]       # Start the dashboard server (default port: 4848)
agent-browser dashboard start --port <n>  # Start on a specific port
agent-browser dashboard stop          # Stop the dashboard server

Doctor#

Diagnose your install, auto-clean stale daemon files, and optionally repair common problems.

agent-browser doctor                     # Full diagnosis (env, Chrome, daemons, config, providers, network, launch test)
agent-browser doctor --offline --quick   # Local-only, fastest
agent-browser doctor --fix               # Also run destructive repairs (reinstall Chrome, purge old state, ...)
agent-browser doctor --json              # Structured JSON output for agents

Exit code is 0 if all checks pass (warnings are fine), 1 if any fail. See the Installation page for the full check catalog.

Chat#

Use natural language to control the browser via AI. The chat command translates instructions into agent-browser commands, executes them, and streams the AI response. Requires AI_GATEWAY_API_KEY to be set.

agent-browser chat "open google.com and search for cats"     # Single-shot instruction
agent-browser chat                                           # Interactive REPL (type quit to exit)
echo "summarize this page" | agent-browser chat              # Piped input
agent-browser -q chat "summarize this page"                  # Quiet: text only, no tool calls shown
agent-browser -v chat "fill in the login form"               # Verbose: show commands and their output
agent-browser --model openai/gpt-4o chat "take a screenshot" # Override the default AI model
agent-browser --json chat "open example.com"                 # Structured JSON output

Chat-specific options:

--model <name>           # AI model (or AI_GATEWAY_MODEL env, default: anthropic/claude-sonnet-4.6)
-v, --verbose            # Show tool commands and their raw output
-q, --quiet              # Show only the AI text response (hide tool calls)

agent-browser back                    # Go back
agent-browser forward                 # Go forward
agent-browser reload                  # Reload page
agent-browser pushstate <url>         # SPA client-side nav; auto-detects window.next.router.push,
                                      # falls back to history.pushState + popstate

Some flows need routes, cookies, or init scripts configured before the first navigation (SSR debug, auth on protected origins, etc.). open without a URL launches the browser but stays on about:blank, leaving room to stage state. batch makes it one CLI invocation:

agent-browser batch \
  '["open"]' \
  '["network","route","*","--abort","--resource-type","script"]' \
  '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
  '["navigate","http://localhost:3000/target"]'

React / Web Vitals#

React commands require --enable react-devtools at launch (installs the React DevTools hook before any page JS runs). vitals and pushstate work on any site.

agent-browser open --enable react-devtools <url>   # Launch with React hook installed
agent-browser react tree                           # Full component tree
agent-browser react inspect <fiberId>              # Inspect one component
agent-browser react renders start                  # Begin fiber render recording
agent-browser react renders stop [--json]          # Stop + print profile
agent-browser react suspense [--only-dynamic] [--json]  # Suspense boundaries + classifier
                                                         # --only-dynamic hides the "static" list
agent-browser vitals [url] [--json]                # LCP/CLS/TTFB/FCP/INP + hydration

Works on any React app (Next.js, Remix, Vite+React, CRA, TanStack Start, React Native Web, etc.). vitals and pushstate are framework-agnostic.

Init scripts#

agent-browser open --init-script <path>           # Register before first navigation (repeatable)
agent-browser addinitscript <js>                  # Register at runtime (returns identifier)
agent-browser removeinitscript <identifier>       # Remove a previously registered init script

Global options#

--session <name>         # Isolated browser session
--session-name <name>    # Auto-save/restore session state (cookies, localStorage)
--profile <path>         # Persistent browser profile directory
--state <path>           # Load storage state from JSON file
--headers <json>         # HTTP headers scoped to URL's origin
--executable-path <path> # Custom browser executable
--extension <path>       # Load browser extension (repeatable)
--init-script <path>     # Register a page init script before first navigation (repeatable)
--enable <feature>       # Built-in init scripts: react-devtools (repeatable or comma-list)
--args <args>            # Browser launch args (comma separated)
--user-agent <ua>        # Custom User-Agent string
--proxy <url>            # Proxy server URL
--proxy-bypass <hosts>   # Hosts to bypass proxy
--ignore-https-errors    # Ignore HTTPS certificate errors
--allow-file-access      # Allow file:// URLs to access local files (Chromium only)
-p, --provider <name>    # Browser provider (ios, browserbase, kernel, browseruse, browserless)
--device <name>          # iOS device name (e.g., "iPhone 15 Pro")
--json                   # JSON output (for scripts)
--annotate               # Annotated screenshot with numbered element labels
--screenshot-dir <path>   # Default screenshot output directory (or AGENT_BROWSER_SCREENSHOT_DIR)
--screenshot-quality <n>  # JPEG quality 0-100 (or AGENT_BROWSER_SCREENSHOT_QUALITY)
--screenshot-format <fmt> # Format: png (default), jpeg (or AGENT_BROWSER_SCREENSHOT_FORMAT)
--headed                 # Show browser window (not headless)
--cdp port|url         # Connect via Chrome DevTools Protocol (port or WebSocket URL)
--auto-connect           # Auto-discover and connect to running Chrome
--color-scheme <scheme>  # Color scheme: dark, light, no-preference
--download-path <path>   # Default download directory
--content-boundaries     # Wrap page output in boundary markers for LLM safety
--max-output <chars>     # Truncate page output to N characters
--allowed-domains <list> # Comma-separated allowed domain patterns
--action-policy <path>   # Path to action policy JSON file
--confirm-actions <list> # Action categories requiring confirmation
--confirm-interactive    # Interactive confirmation prompts (auto-denies if stdin is not a TTY)
--model <name>           # AI model for chat (or AI_GATEWAY_MODEL env)
-v, --verbose            # Show tool commands and their raw output (chat)
-q, --quiet              # Show only AI text responses (chat)
--config <path>          # Use a custom config file
--debug                  # Debug output

Batch execution#

Execute multiple commands in a single invocation. Commands can be passed as quoted arguments or piped as JSON via stdin.

# Argument mode: each quoted argument is a full command
agent-browser batch "open https://example.com" "snapshot -i" "screenshot"

# With --bail to stop on first error
agent-browser batch --bail "open https://example.com" "click @e1" "screenshot"

# Stdin mode: pipe commands as JSON
echo '[
  ["open", "https://example.com"],
  ["snapshot", "-i"],
  ["click", "@e1"],
  ["screenshot", "result.png"]
]' | agent-browser batch --json

Option	Description
`--bail`	Stop on first error (default: continue all commands)
`--json`	Output results as a JSON array

Command chaining#

Chain commands with && in a single shell invocation. The browser persists via a background daemon, so chaining works naturally and is more efficient than separate calls:

agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png

Use && when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact with those refs).

Local files#

Open local files (PDFs, HTML) using file:// URLs:

agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png

The --allow-file-access flag enables JavaScript to access other local files. Chromium only.