GitHub - apexkid/web-scout-ai: Your smart AI agent to discover all user journeys on a website.

Give it a URL. It maps every user journey.

graph TB
    subgraph Input
        URL[/"URL: abc.com"/]
    end

    subgraph BrowserLayer["Browser Layer (Decoupled)"]
        direction TB
        BF["Browser Factory"]
        PW["Playwright<br/>(default)"]
        CF["Camoufox<br/>(stealth)"]
        RC["Real Chrome<br/>(persistent)"]
        BF --> PW & CF & RC
    end

    subgraph Phase1["PHASE 1: DISCOVERY (LLM-Powered)"]
        direction TB

        subgraph PageAnalysis["Page Analysis Pipeline"]
            direction TB
            SS["Screenshot + DOM + A11y Tree"]
            NM["Network Monitor<br/>(AJAX/Fetch calls)"]
            VLM["Claude Vision Analysis"]
            AP["Action Plan:<br/>Semantic actions + patterns"]
            SS & NM --> VLM --> AP
        end

        subgraph ActionClassifier["Action Classifier"]
            direction TB
            NAV["Navigation"]
            INT["Interaction"]
            PAT["Pattern Instance<br/>(sample 1 of N)"]
        end

        subgraph Orchestrator["Orchestrator"]
            direction TB
            JQ["Journey Queue (BFS)"]
            FP["State Fingerprinting"]
            DD["Dedup / Loop Detection"]
            JG["Journey Graph Builder"]
            JQ --> FP --> DD --> JG
        end

        AP --> ActionClassifier
        ActionClassifier --> JQ
    end

    subgraph Persistence["Journey Persistence"]
        direction TB
        EX["Graph to Replay Exporter"]
        JF["Journey JSON Files<br/>(one per journey)"]
        DM["discovery-meta.json"]
        EX --> JF & DM
    end

    subgraph Phase2["PHASE 2: REPLAY (LLM-Free)"]
        direction TB
        RE["Replay Executor<br/>(mechanical)"]
        SC["Screenshot Capture<br/>(every step)"]
        RR["Replay Results<br/>(pass/fail + metadata)"]
        RE --> SC --> RR
    end

    subgraph Output["Output"]
        direction TB
        JT["Journey Tree / Graph"]
        MERM["Visual Journey Map<br/>(Mermaid)"]
        NAR["Natural Language<br/>Journey Descriptions"]
        JSON["Structured JSON Export"]
    end

    URL --> BF
    BF -->|"page object"| SS
    BF -->|"page object"| NM
    JG --> EX
    JG --> Output
    JF -->|"consumed by"| RE
    BF -->|"page object"| RE

    style Phase1 fill:#1a1a2e,stroke:#e94560,color:#fff
    style Phase2 fill:#1a1a2e,stroke:#00b894,color:#fff
    style BrowserLayer fill:#1a1a2e,stroke:#0f3460,color:#fff
    style Persistence fill:#1a1a2e,stroke:#6c5ce7,color:#fff
    style Output fill:#1a1a2e,stroke:#e94560,color:#fff
    style PageAnalysis fill:#16213e,stroke:#0f3460,color:#fff
    style ActionClassifier fill:#16213e,stroke:#16213e,color:#fff
    style Orchestrator fill:#16213e,stroke:#533483,color:#fff

What It Does

Point at a URL — the agent explores the site using Claude Vision, understanding each page like a human would
Discovers full user journeys — Homepage → Product Listing → Product Detail → Cart → Checkout, all found automatically
Replays journeys mechanically — no LLM needed for replay, making it fast and cheap for regression testing
Handles the messy web — cookie banners, shadow DOM, CAPTCHAs, pattern collapse (50 product cards → 1 journey step)
Captures every API call — during replay, records all XHR/fetch requests per step, flags analytics beacons, and outputs a network-summary.json per journey
Structured output — journey graphs, Mermaid diagrams, replay-ready JSON files

Quick Start

git clone https://github.com/apexkid/web-scout-ai && cd web-scout-ai
pip install -e . && playwright install
export ANTHROPIC_API_KEY=your-key
python cli.py auto https://example.com --depth 3

Using conda instead

conda create -p .conda python=3.11 -y
.conda/bin/pip install -e . && .conda/bin/playwright install
export ANTHROPIC_API_KEY=your-key
.conda/bin/python cli.py auto https://example.com --depth 3

How It Works

Two-phase architecture. Discovery uses an LLM (Claude) to explore the site — it takes screenshots, reads the accessibility tree, and decides what to click. Every action builds a journey graph. Once discovery finishes, the graph is exported as standalone JSON files that can be replayed mechanically, with zero LLM calls.

The Element Reference Pattern. Instead of asking the LLM to generate CSS selectors (which it's bad at), we pre-extract every visible interactive element, number them, and let the model pick by reference. The LLM says "click element 7", not "click button.sc-1x2f3y4". This eliminates an entire class of selector hallucination bugs. Read the full write-up →

12-strategy selector waterfall. Each numbered element gets a deterministic CSS selector through a waterfall of 12 strategies — data-testid, ARIA labels, structural paths, and more. If the page has 50 identical product cards, the agent detects the pattern, samples one, and collapses the rest into a single journey step.

Use Cases

QA engineers — auto-discover every reachable user journey, then replay them as regression tests
Product managers — visualize the actual journey graph of your site, spot dead ends and loops
Developers — catch broken flows after deploys without writing a single test script
Security audits — map all reachable states and transitions from an entry point

CLI Reference

Four commands, each with a one-liner example. Expand for full flag tables.

`capture` — Snapshot a single page

python cli.py capture https://example.com

Flags

Flag	Default	Description
`--engine`	`playwright`	Browser engine to use
`--no-dismiss-overlays`	off	Disable automatic overlay/popup dismissal
`--dismiss-overlays-llm`	off	Use Claude to detect overlays when heuristics fail

`explore` — Interactive step-by-step

Human-in-the-loop mode. The agent shows discovered actions, you pick which to take.

python cli.py explore https://example.com

Flags

Flag	Default	Description
`--engine`	`playwright`	Browser engine to use
`--no-dismiss-overlays`	off	Disable automatic overlay/popup dismissal
`--dismiss-overlays-llm`	off	Use Claude to detect overlays when heuristics fail

`auto` — Fully automated BFS discovery

The main command. Explores breadth-first with a live progress tree.

python cli.py auto https://example.com --depth 4 --branches 3 --mermaid

Flags

Flag	Default	Description
`--depth`	`3`	Max BFS depth
`--branches`	`5`	Max branches explored per page
`--secondary`	off	Also explore secondary-priority actions
`--mermaid`	off	Generate a Mermaid flowchart journey map
`--output PATH`	—	Additional output path for `results.json`
`--engine`	`playwright`	Browser engine to use
`--no-dismiss-overlays`	off	Disable automatic overlay/popup dismissal
`--dismiss-overlays-llm`	off	Use Claude to detect overlays when heuristics fail
`--resume RUN_DIR`	—	Resume an interrupted exploration from a previous run directory

`replay` — Re-execute saved journeys

No LLM needed. Replays journeys through a real browser and reports pass/fail per step. By default, every XHR/fetch request is captured and written to a network-summary.json per journey — useful for verifying analytics fires, monitoring third-party API calls, and catching broken endpoints after deploys.

# Replay all journeys for a site
python cli.py replay all example.com --headed

# Replay a single journey
python cli.py replay one output/example.com/auto_20260210_182142/journeys/journey-a-checkout.json

Flags

Flag	Default	Description
`--headed`	off	Run browser in headed (visible) mode
`--engine`	`playwright`	Browser engine to use
`--output-dir`	`output/`	Custom output root for replay results
`--wait`	`1000`	Wait time in ms between steps
`--viewport`	`1280x800`	Viewport size as WIDTHxHEIGHT
`--parallel N`	`1`	Run up to N journeys concurrently
`--dismiss-overlays`	off	Enable overlay dismissal during replay
`--dismiss-overlays-llm`	off	Use Claude to detect overlays during replay
`--step-retries N`	`0`	Retry failed steps up to N times
`--no-capture-network`	off	Disable network (XHR/fetch) capture

Output Structure

output/
└── <domain>/
    ├── auto_YYYYMMDD_HHMMSS/       # Auto-exploration run
    │   ├── graph.json
    │   ├── results.json
    │   ├── journey_map.md
    │   ├── checkpoint.json          # Deleted on clean completion
    │   ├── screenshots/
    │   └── journeys/                # Replay-ready journey files
    │       ├── discovery-meta.json
    │       └── journey-a-*.json
    │
    ├── capture_YYYYMMDD_HHMMSS/     # Page capture
    │   ├── meta.json
    │   ├── elements.json
    │   ├── elements.tsv
    │   ├── a11y_tree.txt
    │   └── screenshot.png
    │
    ├── explore_YYYYMMDD_HHMMSS/     # Interactive exploration
    │   └── (screenshots + session artifacts)
    │
    └── replay_YYYYMMDD_HHMMSS/      # Replay results
        ├── replay-summary.json
        └── journey-a-*/
            ├── replay-result.json
            ├── network-summary.json
            └── step-01-*.png

Advanced Topics

Overlay Dismissal

Cookie consent banners, newsletter popups, chat widgets — the agent auto-dismisses them after every navigation. Two tiers:

Tier	Method	Cost
Heuristic	Known CMP selectors (OneTrust, CookieBot, etc.), text-matched buttons, high-z-index close buttons	Free
LLM-assisted	Claude Haiku analyzes a screenshot to find dismiss targets	~$0.002/call

Heuristic is on by default for discovery, off for replay. LLM tier is opt-in via --dismiss-overlays-llm.

Error Recovery

LLM API calls: Retried with exponential backoff (up to 4 attempts) on rate limits and server errors
Page navigation: Retried twice on timeouts and transient network errors
Click execution: Retried twice on stale elements
CAPTCHA detection: Cloudflare, reCAPTCHA, hCaptcha, and Turnstile are detected and skipped
Redirect loop protection: Aborts after 10 redirects

Browser Engines

Engine	`--engine` value	Description
Playwright Chromium	`playwright`	Default headless Chromium
Camoufox	`camoufox`	Stealth Firefox (`pip install camoufox`)
Real Chrome	`real_chrome`	Persistent profile, always headed — retains cookies/logins

Checkpoint & Resume

During auto exploration, a checkpoint saves every 5 nodes. If interrupted:

python cli.py auto --resume output/example.com/auto_YYYYMMDD_HHMMSS/

Exploration continues exactly where it left off. The checkpoint is deleted on clean completion.

Deep Dive

The core of this agent is the Element Reference Pattern — a technique for grounding vision-language models in the DOM without asking them to generate selectors. It eliminates an entire class of failures that plague browser agents.

Read the full blog post →

Project Structure

src/
  analyzer/       # LLM-powered page analysis (screenshot + a11y → actions)
  browser/        # Browser engine factory, page capture, action execution
  cli/            # Rich live-progress display for the terminal
    commands/     # CLI command handlers (capture, explore, auto, replay)
    display.py    # Display helpers for exploration output
  llm/            # Anthropic API client wrapper
  models/         # Pydantic data models (actions, graph, journeys)
  orchestrator/   # BFS explorer and interactive explorer logic
  output/         # JSON/Mermaid export and journey persistence
  replay/         # Journey replay executor and output helpers
  utils/          # Logging setup and retry helpers

License

MIT

What It Does

Quick Start

How It Works

Use Cases

CLI Reference

capture — Snapshot a single page

explore — Interactive step-by-step

auto — Fully automated BFS discovery

replay — Re-execute saved journeys

Output Structure

Overlay Dismissal

Error Recovery

Browser Engines

Checkpoint & Resume

Deep Dive

Project Structure

License

`capture` — Snapshot a single page

`explore` — Interactive step-by-step

`auto` — Fully automated BFS discovery

`replay` — Re-execute saved journeys