GitHub - rinvii/buse: Control your browser from the terminal.

Control your browser from your terminal.

buse is a stateless CLI designed for AI agents and automation scripts. It turns complex browser interaction into simple, structured command-line primitives.

Key Features

Stateless Control: Just point the CLI at a browser and go.
Persistent Sessions: Multiple browser instances can run simultaneously.
Universal Primitives: Click, type, scroll, and execute JS with one-liners.
Vision-Ready: observe captures semantic state plus optional screenshots and SoM labels.
Session Migration: Export cookies/storage via save-state to maintain persistent logins.

Why 'buse'?

Automating a browser usually means writing long, complex scripts or paying for expensive cloud services. buse changes that by letting you control a browser just like any other folder or file on your computer—using simple, one-word commands in your terminal.

For example, open a browser and navigate to a website:

uvx --python 3.12 buse browser-1
uvx --python 3.12 buse browser-1 navigate "https://example.com"
uvx --python 3.12 buse browser-2 # open a second browser
uvx --python 3.12 buse browser-2 search "latest tech news"

Installation

With uv:

uvx --python 3.12 buse --help

With pip:

From source:

cd buse
uv pip install -e .

Requirements

Python 3.12
Google Chrome (local install)

Usage Pattern

buse <instance_id> <command> [args]

Command List

1. Lifecycle & State

Command	Description	Example
`<id>`	Initialize/Start a new browser instance	`buse b1`
`list`	Show all active browser instances	`buse list`
`stop`	Stop and kill a browser instance	`buse b1 stop`
`save-state`	Export cookies/storage to a file	`buse b1 save-state cookies.json`

2. Analysis & Extraction

Command	Description	Example
`observe`	Snapshot page state (visual + text modes)	`buse b1 observe --visual som`
`extract`	LLM extraction (set `BUSE_EXTRACT_MODEL`)	`buse b1 extract "get product info"`

observe notes

DOM indices are ephemeral; refresh with buse <id> observe after page changes, or use --id/--class for stability.
Preferred flags are --visual (som, omni, none), --text (ai, dom, none), and --mode (efficient, full, raw).
--human prints a human-friendly layout; JSON output is better for agents.
Legacy flags (--screenshot, --omniparser, --som, --semantic, --no-dom, --diagnostics) are still supported for compatibility.
observe --visual omni always captures a screenshot: saves image.jpg (input) and image_som.jpg (server output) in the screenshots dir or --path.
When available, screenshot_path points to image_som.jpg. OmniParser bbox values are in CSS pixels (not normalized).
Use --text none to skip DOM processing and return an empty dom_minified.
--max-chars 0 disables semantic truncation entirely.

3. Navigation & Interaction

Command	Description	Example
`navigate`	Load a specific URL (supports `--new-tab`)	`buse b1 navigate "https://google.com"`
`new-tab`	Open a URL in a new tab (alias for `navigate --new-tab`)	`buse b1 new-tab "https://example.com"`
`search`	Search the web (engines: `google`, `bing`, `duckduckgo`)	`buse b1 search "query" --engine google`
`click`	Click by index/ref (`eN`), selector, id/class, or coordinates (with modifiers)	`buse b1 click e3 --double`
`input`	Type text into a field by index/ref (`eN`) or `--id`/`--class` (supports `--slowly`, `--append`, `--submit`)	`buse b1 input e3 "Hello"`
`fill`	Fill multiple fields in one command (JSON payload)	`buse b1 fill '[{"ref":"e1","value":"a"}]'`
`drag`	Drag from one element to another (ref/index)	`buse b1 drag e1 e2`
`upload-file`	Upload a file to an element by index	`buse b1 upload-file 5 "./img.png"`
`send-keys`	Send special keys or text (use `--list-keys` for names, optional focus with `--index/--id/--class`)	`buse b1 send-keys "Enter"`
`find-text`	Scroll to specific text on the page	`buse b1 find-text "Contact"`
`dropdown-options`	List options for a select element by index or `--id`/`--class`	`buse b1 dropdown-options 12`
`select-dropdown`	Select dropdown option by visible text and index or `--id`/`--class` (use `--text` when no index)	`buse b1 select-dropdown 12 "Option"`
`hover`	Hover over an element by index or `--id`/`--class`	`buse b1 hover 5`
`scroll`	Scroll page or a specific element (use `--up` or `--down`)	`buse b1 scroll --up --pages 2`
`refresh`	Reload the current page	`buse b1 refresh`
`go-back`	Go back in browser history	`buse b1 go-back`
`wait`	Wait by time, selector, text, or network idle	`buse b1 wait 2`
`evaluate`	Execute custom JavaScript code	`buse b1 evaluate "alert('Hi')"`

4. Advanced

Command	Description	Example
`switch-tab`	Switch by 4-char tab ID	`buse b1 switch-tab "4D39"`
`close-tab`	Close by 4-char tab ID	`buse b1 close-tab "4D39"`

Examples

Flag Matrix

Global (all commands):

--format (json|toon, default: json), -f alias
--profile (default: false), -p alias

Selected command flags:

observe: --visual, --text, --mode, --max-chars, --max-labels, --selector, --frame, --human, --path (legacy: --screenshot, --omniparser, --som, --semantic, --no-dom, --diagnostics)
click: --selector, --id, --class, --x/--y, --right, --middle, --double, --ctrl/--shift/--alt/--meta, --force, --debug
input: --text, --id, --class, --slowly, --append, --submit
fill: JSON list payload (positional)
drag: --html5/--no-html5
send-keys: --index, --id, --class, --list-keys
scroll: --down/--up, --pages, --index
wait: --text, --selector, --network-idle, --timeout

Commands

# Start a session
buse b1

# Observe without screenshot (JSON)
buse b1 observe

# Observe with SoM labels and semantic text (JSON + image)
buse b1 observe --visual som --text ai

# Navigate and click by coordinates
buse b1 navigate "https://example.com"
buse b1 click --x 280 --y 220

# Click by ref/id/class fallback
buse b1 click e3
buse b1 click --id "submit-button"
buse b1 click --class "cta-primary"

# Input by id with explicit --text
buse b1 input --id "email" --text "test@example.com"

# Input slowly and submit
buse b1 input --id "email" --text "test@example.com" --slowly --submit

# Fill multiple fields atomically
buse b1 fill '[{"ref":"e1","value":"user"},{"ref":"e2","value":"pass","type":"text"}]'

# Drag and drop
buse b1 drag e1 e2

# Upload a file
buse b1 upload-file 5 "./image.png"

# Send special keys
buse b1 send-keys "Enter"

# Send keys to a focused element
buse b1 send-keys --id "search" "Hello"

# List send-keys names
buse b1 send-keys --list-keys

# Find and scroll to text
buse b1 find-text "Contact Us"

# Get dropdown options and select by text
buse b1 dropdown-options --id "country"
buse b1 select-dropdown --id "country" --text "Canada"

# Scroll and wait
buse b1 scroll --down --pages 1.5
buse b1 scroll --up --pages 1
buse b1 wait 2

MCP Server

Expose the active browser instances via the Model Context Protocol.

buse mcp-server --host 0.0.0.0 --port 8000

--transport selects streamable-http (default), sse, or stdio.
--name changes the MCP server name, --stateless/--stateful controls HTTP mode, and --json-response/--no-json-response toggles JSON wrapping.
--allow-remote permits non-local clients (default: local-only). --auth-token requires Authorization: Bearer <token> or X-Buse-Token for HTTP requests.
--format (json|toon, default: json), -f alias.
Resources:
- buse://sessions returns a list of session metadata (instance_id, cdp_url, user_data_dir).
- buse://session/{id} returns the metadata for a single session.
Tools:
- Supports all CLI actions: navigate, click, input_text, fill, drag, send_keys, scroll, switch_tab, close_tab, search, upload_file, find_text, dropdown_options, select_dropdown, go_back, hover, refresh, wait, save_state, extract, evaluate, stop_session, start_session, observe.

The mcp SDK ships with buse, so no extra installation is required.

Output & Profiling

--format json|toon to switch output format.
--profile (or -p) includes timing data in the JSON response.

Environment Variables

BUSE_EXTRACT_MODEL: model name for extract (default: gpt-4o-mini).
OPENAI_API_KEY: required for extract.
BUSE_KEEP_SESSION: set to 1 to keep the session open within a single process.
BUSE_SELECTOR_CACHE_TTL: selector-map cache TTL in seconds (default: 0, disabled).
BUSE_REMOTE_ALLOW_ORIGINS: override Chrome --remote-allow-origins (default: http://localhost:<port>,http://127.0.0.1:<port>).
BUSE_IMAGE_QUALITY: JPEG quality (1-100) for OmniParser images.
BUSE_MCP_ALLOW_REMOTE: set to 1 to allow non-local MCP clients.
BUSE_MCP_AUTH_TOKEN: require a Bearer or X-Buse-Token header for MCP HTTP access.

References & Inspiration

https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-computer-use-model/

https://www.anthropic.com/news/3-5-models-and-computer-use

https://docs.browser-use.com/introduction

Roadmap

Support all operating systems: Windows, macOS, Linux (right now works on my 10.15 macOS and Windows 11)
Add automation scripting examples
Add e2e tests
Add optional daemon for persistent background sessions