GitHub - simonarthur/spark-runner: Automated web browser testing that's easy to use.

8 min read Original article ↗

"Who watches the website? We do. Tirelessly. Without blinking."


There exists, in every web application, a silent covenant between the developer and the user. A promise that the button shall do what the button purports to do. That the form shall submit. That the login shall, against all entropy, log in. Spark Runner is the enforcer of that covenant — an autonomous browser agent that decomposes your intentions into phases, executes them through a real browser, and reports back with the unflinching honesty of a coroner's inquest.

It is built on Browser Use and Claude. It learns from its past. It does not forget.

Installation

Or, if you prefer to work from the source — as one should, to truly understand the machinery:

git clone https://github.com/simonarthur/spark-runner.git
cd spark-runner
pip install -e .

Requirements: Python 3.11 or later. An Anthropic API key. The courage to see what your application actually does.

First Light

A wizard — not the robed kind, though no less powerful — will guide you through configuration. It will ask for your data directory, your base URL, your credentials, your API keys. It will offer to store your secrets as environment variable references rather than plaintext, because some things should not be written down.

When the wizard is finished, it will suggest:

spark-runner generate-goals /path/to/your/repo

This scans your frontend source code — your .tsx, your .jsx, your .vue, your .svelte — and extracts testable features. Each becomes a goal. Each goal, a contract to be verified.

The Execution

At the heart of Spark Runner lies a cycle, elegant in its brutality:

  1. Decomposition. A goal is broken into phases — discrete, ordered steps that a browser must perform.
  2. Execution. Each phase is carried out by an autonomous agent piloting a real browser. Playwright underneath. Claude at the helm.
  3. Summarisation. What happened is distilled into structured prose. Observations are extracted — errors, warnings, the quiet failures that pass unnoticed by human eyes.
  4. Classification. Each observation is weighed and judged. Error or warning. Signal or noise.
  5. Knowledge. Everything learned is stored. The next run inherits from the last. The system remembers what worked, what broke, and why.
# Run a single task from a prompt
spark-runner run -p "Log in, navigate to settings, change the display name"

# Run from goal files
spark-runner run login-task.json checkout-task.json

# Run in parallel, headless, in a specific environment
spark-runner run --parallel 3 --headless --env staging *.json

Configuration

All configuration lives in config.yaml, placed inside your data directory (default: ~/spark_runner).

general:
  data_dir: ~/spark_runner
  base_url: https://your-app.example.com
  use_browseruse_llm: false
  ui_instructions:
    - "The primary Save button is blue, at the bottom-right of forms"
    - "Toast notifications appear top-right and auto-dismiss after 3s"

api_keys:
  anthropic: $ANTHROPIC_API_KEY
  browseruse: $BROWSER_USE_API_KEY

credentials:
  default:
    email: $SPARK_RUNNER_DEFAULT_EMAIL
    password: $SPARK_RUNNER_DEFAULT_PASSWORD
  admin:
    email: admin@example.com
    password: $SPARK_RUNNER_ADMIN_PASSWORD
    api_key: $ADMIN_API_KEY          # extra fields go to .extra dict

environments:
  staging:
    base_url: https://staging.example.com
    ui_instructions:
      - "Staging has a yellow banner at top -- ignore it"
    credentials:
      default:
        email: $SPARK_RUNNER_STAGING_DEFAULT_EMAIL
        password: $SPARK_RUNNER_STAGING_DEFAULT_PASSWORD
  production:
    base_url: https://app.example.com
    is_production: true

models:
  task_decomposition:
    model: claude-sonnet-4-5-20250929
    max_tokens: 16384
  summarization:
    model: claude-sonnet-4-5-20250929
    max_tokens: 2048
    temperature: 0.0

Credential values support $VAR and ${VAR} syntax. If the variable is set in the environment, it resolves. If not, the literal string is kept — a visible scar reminding you to set it.

UI Instructions

Site-specific UI hints that get injected into every phase prompt, so the browser agent knows about visual quirks before it starts interacting. Define them under general.ui_instructions for global hints, or under an environment to override them:

general:
  ui_instructions:
    - "The primary Save button is blue, at the bottom-right of forms"
    - "Toast notifications appear top-right and auto-dismiss"

environments:
  staging:
    base_url: https://staging.example.com
    ui_instructions:
      - "Staging has a yellow banner at top -- ignore it"
      - "The primary Save button is blue, at the bottom-right of forms"

When --env staging is selected, the environment's ui_instructions replace the general ones entirely (they do not merge). If an environment omits ui_instructions, the general-level list is used. A single string is also accepted and automatically wrapped into a list.

Credential Profiles

Each profile requires email and password. Any additional keys are stored in an extra dict, accessible via config.credentials["profile_name"].extra:

credentials:
  default:
    email: user@example.com
    password: secret
  service_account:
    email: bot@example.com
    password: botpass
    api_key: sk-service-123    # available as .extra["api_key"]
    org_id: org-456            # available as .extra["org_id"]

Classification Rules

Observation classification — the process that decides whether something is an error or a warning — can be guided by a rules file. By default Spark Runner looks for classification_rules.txt in the working directory. The format is simple:

# Lines starting with # are comments. Blank lines are ignored.

[ERRORS]
Form submission failing or producing an application error
A feature that required a workaround using a different mechanism

[WARNINGS]
An already-active session detected when login is requested
A URL that changed but the feature still works

Rules under [ERRORS] bias the classifier toward marking matching observations as errors; rules under [WARNINGS] bias toward warnings. These are prioritized hints to the LLM, not exact string matches.

Commands

Global Options

These apply to all subcommands:

spark-runner [--data-dir PATH] [--config PATH] <command>

  --data-dir PATH               Spark Runner home directory (default: ~/spark_runner)
  --config PATH                 Config file path (default: <data-dir>/config.yaml)

Running Tasks

spark-runner run [OPTIONS] [GOAL_FILES...]

  -p, --prompt TEXT             Task prompt (repeatable)
  -u, --url TEXT                Override base URL
  --env TEXT                    Select environment profile
  --credential-profile TEXT     Select credential profile
  --headless                    No visible browser
  --auto-close                  Close browser when done
  --shared-session              Share browser across tasks
  --parallel N                  Parallel execution count
  --model PURPOSE=MODEL_ID      Override a model (repeatable)
  --regenerate-tasks            Re-decompose into fresh phases
  --no-knowledge-reuse          Ignore prior runs
  --no-update-summary           Don't update goal summaries
  --no-update-tasks             Don't overwrite task files
  --force-unsafe                Override production safety checks
  --unrun                       Only run goals never executed
  --failed                      Only run goals with prior errors

Generating Goals

spark-runner generate-goals SOURCE_PATH [--branch main] [--output-dir DIR]

Accepts a local directory or a git repository URL. Scans frontend source files, extracts features, produces goal files.

Recording Demonstrations

spark-runner record [--url URL]

Opens a browser. You demonstrate. The system watches, records your actions — clicks, keystrokes, navigations — and when you press Ctrl+C, it transmutes the recording into a structured goal.

Managing Goals

spark-runner goals list [--unrun] [--failed]
spark-runner goals show GOAL_NAME
spark-runner goals delete GOAL_NAME [--force]
spark-runner goals classify
spark-runner goals orphans [--clean]

Viewing Results

spark-runner results list [--task NAME]
spark-runner results show RUN_PATH
spark-runner results errors [--task NAME]
spark-runner results screenshots RUN_PATH
spark-runner results report RUN_PATH [--all]

Reports are self-contained HTML — screenshots, phase timelines, event logs, observations — viewable in any browser without a server.

Knowledge Reuse

This is not a stateless tool. Each run deposits knowledge: which subtasks worked, which observations arose, what the system learned about your application. On subsequent runs, Spark Runner searches this accumulated knowledge for reusable subtasks and relevant observations. Prior failures inform future attempts. Prior successes are not re-derived from nothing.

The knowledge index lives in your goal_summaries/ and tasks/ directories. Disable it with --no-knowledge-reuse if you prefer amnesia.

Environments & Safety

Goals may declare safety metadata:

{
  "safety": {
    "blocked_in_production": true,
    "allowed_environments": ["staging", "development"],
    "risk_level": "high",
    "reason": "Creates test data"
  }
}

A goal marked blocked_in_production will refuse to run in a production environment. This is not a suggestion. Override with --force-unsafe if you are certain — truly certain — of what you are doing.

LLM Models

Six model slots, each configurable independently. Each accepts model, max_tokens, and temperature:

Purpose Default Max Tokens Role
task_decomposition claude-sonnet-4-5 16,384 Breaking goals into phases
summarization claude-sonnet-4-5 2,048 Phase result summaries
classification claude-sonnet-4-5 4,096 Observation classification
knowledge_matching claude-sonnet-4-5 4,096 Finding prior knowledge
task_naming claude-sonnet-4-5 64 Short names for tasks
browser_control claude-sonnet-4-5 4,096 Reserved for future use

Override per run:

spark-runner run --model task_decomposition=claude-opus-4-6 goal.json

Environment Variables

Variable Purpose
SPARK_RUNNER_DATA_DIR Data directory
SPARK_RUNNER_CONFIG Config file path
SPARK_RUNNER_BASE_URL Base URL override
ANTHROPIC_API_KEY Claude API key
BROWSER_USE_API_KEY BrowserUse cloud key
USER_EMAIL Legacy: overrides default credential email
USER_PASSWORD Legacy: overrides default credential password

The setup wizard can also generate per-profile env vars (e.g. SPARK_RUNNER_ADMIN_EMAIL, SPARK_RUNNER_STAGING_DEFAULT_PASSWORD) when you choose env-var mode for secrets.

Data Directory Structure

~/spark_runner/
├── config.yaml
├── classification_rules.txt  # Optional observation classification hints
├── tasks/                    # Phase instruction files
├── goal_summaries/           # Goal metadata (JSON)
└── runs/
    └── task-name/
        └── 2025-03-06T12-34-56Z/
            ├── metadata.json
            ├── event_log.txt
            ├── problem_log.txt
            ├── conversation_log.json
            ├── phase_summaries.json
            ├── llm_*.json            # Full LLM traces
            ├── *.png                 # Screenshots
            ├── goal/                 # Goal snapshot
            └── report/               # HTML report
                ├── index.html
                ├── goal.html
                ├── phases.html
                └── events.html

Every LLM call is traced. Every screenshot is kept. Every event is logged. The past is not hidden here — it is preserved.

License

MIT


"The accumulated filth of all their broken forms and unhandled exceptions will foam up about their waists and all the developers will look up and shout 'Does it work?' ... and Spark Runner will look down and whisper 'Here is the report.'"