GitHub - rinvii/buse: Control your browser from the terminal.

6 min read Original article ↗

CI PyPI version Python versions License

Control your browser from your terminal.

buse is a stateless CLI designed for AI agents and automation scripts. It turns complex browser interaction into simple, structured command-line primitives.

Key Features

  • Stateless Control: Just point the CLI at a browser and go.
  • Persistent Sessions: Multiple browser instances can run simultaneously.
  • Universal Primitives: Click, type, scroll, and execute JS with one-liners.
  • Vision-Ready: observe captures semantic state plus optional screenshots and SoM labels.
  • Session Migration: Export cookies/storage via save-state to maintain persistent logins.

Why 'buse'?

Automating a browser usually means writing long, complex scripts or paying for expensive cloud services. buse changes that by letting you control a browser just like any other folder or file on your computer—using simple, one-word commands in your terminal.

For example, open a browser and navigate to a website:

uvx --python 3.12 buse browser-1
uvx --python 3.12 buse browser-1 navigate "https://example.com"
uvx --python 3.12 buse browser-2 # open a second browser
uvx --python 3.12 buse browser-2 search "latest tech news"

Installation

With uv:

uvx --python 3.12 buse --help

With pip:

From source:

cd buse
uv pip install -e .

Requirements

  • Python 3.12
  • Google Chrome (local install)

Usage Pattern

buse <instance_id> <command> [args]


Command List

1. Lifecycle & State

Command Description Example
<id> Initialize/Start a new browser instance buse b1
list Show all active browser instances buse list
stop Stop and kill a browser instance buse b1 stop
save-state Export cookies/storage to a file buse b1 save-state cookies.json

2. Analysis & Extraction

Command Description Example
observe Snapshot page state (visual + text modes) buse b1 observe --visual som
extract LLM extraction (set BUSE_EXTRACT_MODEL) buse b1 extract "get product info"

observe notes

  • DOM indices are ephemeral; refresh with buse <id> observe after page changes, or use --id/--class for stability.
  • Preferred flags are --visual (som, omni, none), --text (ai, dom, none), and --mode (efficient, full, raw).
  • --human prints a human-friendly layout; JSON output is better for agents.
  • Legacy flags (--screenshot, --omniparser, --som, --semantic, --no-dom, --diagnostics) are still supported for compatibility.
  • observe --visual omni always captures a screenshot: saves image.jpg (input) and image_som.jpg (server output) in the screenshots dir or --path.
  • When available, screenshot_path points to image_som.jpg. OmniParser bbox values are in CSS pixels (not normalized).
  • Use --text none to skip DOM processing and return an empty dom_minified.
  • --max-chars 0 disables semantic truncation entirely.

3. Navigation & Interaction

Command Description Example
navigate Load a specific URL (supports --new-tab) buse b1 navigate "https://google.com"
new-tab Open a URL in a new tab (alias for navigate --new-tab) buse b1 new-tab "https://example.com"
search Search the web (engines: google, bing, duckduckgo) buse b1 search "query" --engine google
click Click by index/ref (eN), selector, id/class, or coordinates (with modifiers) buse b1 click e3 --double
input Type text into a field by index/ref (eN) or --id/--class (supports --slowly, --append, --submit) buse b1 input e3 "Hello"
fill Fill multiple fields in one command (JSON payload) buse b1 fill '[{"ref":"e1","value":"a"}]'
drag Drag from one element to another (ref/index) buse b1 drag e1 e2
upload-file Upload a file to an element by index buse b1 upload-file 5 "./img.png"
send-keys Send special keys or text (use --list-keys for names, optional focus with --index/--id/--class) buse b1 send-keys "Enter"
find-text Scroll to specific text on the page buse b1 find-text "Contact"
dropdown-options List options for a select element by index or --id/--class buse b1 dropdown-options 12
select-dropdown Select dropdown option by visible text and index or --id/--class (use --text when no index) buse b1 select-dropdown 12 "Option"
hover Hover over an element by index or --id/--class buse b1 hover 5
scroll Scroll page or a specific element (use --up or --down) buse b1 scroll --up --pages 2
refresh Reload the current page buse b1 refresh
go-back Go back in browser history buse b1 go-back
wait Wait by time, selector, text, or network idle buse b1 wait 2
evaluate Execute custom JavaScript code buse b1 evaluate "alert('Hi')"

4. Advanced

Command Description Example
switch-tab Switch by 4-char tab ID buse b1 switch-tab "4D39"
close-tab Close by 4-char tab ID buse b1 close-tab "4D39"

Examples

Flag Matrix

Global (all commands):

  • --format (json|toon, default: json), -f alias
  • --profile (default: false), -p alias

Selected command flags:

  • observe: --visual, --text, --mode, --max-chars, --max-labels, --selector, --frame, --human, --path (legacy: --screenshot, --omniparser, --som, --semantic, --no-dom, --diagnostics)
  • click: --selector, --id, --class, --x/--y, --right, --middle, --double, --ctrl/--shift/--alt/--meta, --force, --debug
  • input: --text, --id, --class, --slowly, --append, --submit
  • fill: JSON list payload (positional)
  • drag: --html5/--no-html5
  • send-keys: --index, --id, --class, --list-keys
  • scroll: --down/--up, --pages, --index
  • wait: --text, --selector, --network-idle, --timeout

Commands

# Start a session
buse b1

# Observe without screenshot (JSON)
buse b1 observe

# Observe with SoM labels and semantic text (JSON + image)
buse b1 observe --visual som --text ai

# Navigate and click by coordinates
buse b1 navigate "https://example.com"
buse b1 click --x 280 --y 220

# Click by ref/id/class fallback
buse b1 click e3
buse b1 click --id "submit-button"
buse b1 click --class "cta-primary"

# Input by id with explicit --text
buse b1 input --id "email" --text "test@example.com"

# Input slowly and submit
buse b1 input --id "email" --text "test@example.com" --slowly --submit

# Fill multiple fields atomically
buse b1 fill '[{"ref":"e1","value":"user"},{"ref":"e2","value":"pass","type":"text"}]'

# Drag and drop
buse b1 drag e1 e2

# Upload a file
buse b1 upload-file 5 "./image.png"

# Send special keys
buse b1 send-keys "Enter"

# Send keys to a focused element
buse b1 send-keys --id "search" "Hello"

# List send-keys names
buse b1 send-keys --list-keys

# Find and scroll to text
buse b1 find-text "Contact Us"

# Get dropdown options and select by text
buse b1 dropdown-options --id "country"
buse b1 select-dropdown --id "country" --text "Canada"

# Scroll and wait
buse b1 scroll --down --pages 1.5
buse b1 scroll --up --pages 1
buse b1 wait 2

MCP Server

Expose the active browser instances via the Model Context Protocol.

buse mcp-server --host 0.0.0.0 --port 8000
  • --transport selects streamable-http (default), sse, or stdio.
  • --name changes the MCP server name, --stateless/--stateful controls HTTP mode, and --json-response/--no-json-response toggles JSON wrapping.
  • --allow-remote permits non-local clients (default: local-only). --auth-token requires Authorization: Bearer <token> or X-Buse-Token for HTTP requests.
  • --format (json|toon, default: json), -f alias.
  • Resources:
    • buse://sessions returns a list of session metadata (instance_id, cdp_url, user_data_dir).
    • buse://session/{id} returns the metadata for a single session.
  • Tools:
    • Supports all CLI actions: navigate, click, input_text, fill, drag, send_keys, scroll, switch_tab, close_tab, search, upload_file, find_text, dropdown_options, select_dropdown, go_back, hover, refresh, wait, save_state, extract, evaluate, stop_session, start_session, observe.

The mcp SDK ships with buse, so no extra installation is required.

Output & Profiling

  • --format json|toon to switch output format.
  • --profile (or -p) includes timing data in the JSON response.

Environment Variables

  • BUSE_EXTRACT_MODEL: model name for extract (default: gpt-4o-mini).
  • OPENAI_API_KEY: required for extract.
  • BUSE_KEEP_SESSION: set to 1 to keep the session open within a single process.
  • BUSE_SELECTOR_CACHE_TTL: selector-map cache TTL in seconds (default: 0, disabled).
  • BUSE_REMOTE_ALLOW_ORIGINS: override Chrome --remote-allow-origins (default: http://localhost:<port>,http://127.0.0.1:<port>).
  • BUSE_IMAGE_QUALITY: JPEG quality (1-100) for OmniParser images.
  • BUSE_MCP_ALLOW_REMOTE: set to 1 to allow non-local MCP clients.
  • BUSE_MCP_AUTH_TOKEN: require a Bearer or X-Buse-Token header for MCP HTTP access.

References & Inspiration

https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-computer-use-model/

https://www.anthropic.com/news/3-5-models-and-computer-use

https://docs.browser-use.com/introduction

Roadmap

  • Support all operating systems: Windows, macOS, Linux (right now works on my 10.15 macOS and Windows 11)
  • Add automation scripting examples
  • Add e2e tests
  • Add optional daemon for persistent background sessions