GitHub - apexkid/web-scout-ai: Your smart AI agent to discover all user journeys on a website.

7 min read Original article ↗

Give it a URL. It maps every user journey.

Python 3.11+ License: MIT Powered by Claude

graph TB
    subgraph Input
        URL[/"URL: abc.com"/]
    end

    subgraph BrowserLayer["Browser Layer (Decoupled)"]
        direction TB
        BF["Browser Factory"]
        PW["Playwright<br/>(default)"]
        CF["Camoufox<br/>(stealth)"]
        RC["Real Chrome<br/>(persistent)"]
        BF --> PW & CF & RC
    end

    subgraph Phase1["PHASE 1: DISCOVERY (LLM-Powered)"]
        direction TB

        subgraph PageAnalysis["Page Analysis Pipeline"]
            direction TB
            SS["Screenshot + DOM + A11y Tree"]
            NM["Network Monitor<br/>(AJAX/Fetch calls)"]
            VLM["Claude Vision Analysis"]
            AP["Action Plan:<br/>Semantic actions + patterns"]
            SS & NM --> VLM --> AP
        end

        subgraph ActionClassifier["Action Classifier"]
            direction TB
            NAV["Navigation"]
            INT["Interaction"]
            PAT["Pattern Instance<br/>(sample 1 of N)"]
        end

        subgraph Orchestrator["Orchestrator"]
            direction TB
            JQ["Journey Queue (BFS)"]
            FP["State Fingerprinting"]
            DD["Dedup / Loop Detection"]
            JG["Journey Graph Builder"]
            JQ --> FP --> DD --> JG
        end

        AP --> ActionClassifier
        ActionClassifier --> JQ
    end

    subgraph Persistence["Journey Persistence"]
        direction TB
        EX["Graph to Replay Exporter"]
        JF["Journey JSON Files<br/>(one per journey)"]
        DM["discovery-meta.json"]
        EX --> JF & DM
    end

    subgraph Phase2["PHASE 2: REPLAY (LLM-Free)"]
        direction TB
        RE["Replay Executor<br/>(mechanical)"]
        SC["Screenshot Capture<br/>(every step)"]
        RR["Replay Results<br/>(pass/fail + metadata)"]
        RE --> SC --> RR
    end

    subgraph Output["Output"]
        direction TB
        JT["Journey Tree / Graph"]
        MERM["Visual Journey Map<br/>(Mermaid)"]
        NAR["Natural Language<br/>Journey Descriptions"]
        JSON["Structured JSON Export"]
    end

    URL --> BF
    BF -->|"page object"| SS
    BF -->|"page object"| NM
    JG --> EX
    JG --> Output
    JF -->|"consumed by"| RE
    BF -->|"page object"| RE

    style Phase1 fill:#1a1a2e,stroke:#e94560,color:#fff
    style Phase2 fill:#1a1a2e,stroke:#00b894,color:#fff
    style BrowserLayer fill:#1a1a2e,stroke:#0f3460,color:#fff
    style Persistence fill:#1a1a2e,stroke:#6c5ce7,color:#fff
    style Output fill:#1a1a2e,stroke:#e94560,color:#fff
    style PageAnalysis fill:#16213e,stroke:#0f3460,color:#fff
    style ActionClassifier fill:#16213e,stroke:#16213e,color:#fff
    style Orchestrator fill:#16213e,stroke:#533483,color:#fff
Loading

What It Does

  • Point at a URL — the agent explores the site using Claude Vision, understanding each page like a human would
  • Discovers full user journeys — Homepage → Product Listing → Product Detail → Cart → Checkout, all found automatically
  • Replays journeys mechanically — no LLM needed for replay, making it fast and cheap for regression testing
  • Handles the messy web — cookie banners, shadow DOM, CAPTCHAs, pattern collapse (50 product cards → 1 journey step)
  • Captures every API call — during replay, records all XHR/fetch requests per step, flags analytics beacons, and outputs a network-summary.json per journey
  • Structured output — journey graphs, Mermaid diagrams, replay-ready JSON files

Quick Start

git clone https://github.com/apexkid/web-scout-ai && cd web-scout-ai
pip install -e . && playwright install
export ANTHROPIC_API_KEY=your-key
python cli.py auto https://example.com --depth 3
Using conda instead
conda create -p .conda python=3.11 -y
.conda/bin/pip install -e . && .conda/bin/playwright install
export ANTHROPIC_API_KEY=your-key
.conda/bin/python cli.py auto https://example.com --depth 3

How It Works

Two-phase architecture. Discovery uses an LLM (Claude) to explore the site — it takes screenshots, reads the accessibility tree, and decides what to click. Every action builds a journey graph. Once discovery finishes, the graph is exported as standalone JSON files that can be replayed mechanically, with zero LLM calls.

The Element Reference Pattern. Instead of asking the LLM to generate CSS selectors (which it's bad at), we pre-extract every visible interactive element, number them, and let the model pick by reference. The LLM says "click element 7", not "click button.sc-1x2f3y4". This eliminates an entire class of selector hallucination bugs. Read the full write-up →

12-strategy selector waterfall. Each numbered element gets a deterministic CSS selector through a waterfall of 12 strategies — data-testid, ARIA labels, structural paths, and more. If the page has 50 identical product cards, the agent detects the pattern, samples one, and collapses the rest into a single journey step.

Use Cases

  • QA engineers — auto-discover every reachable user journey, then replay them as regression tests
  • Product managers — visualize the actual journey graph of your site, spot dead ends and loops
  • Developers — catch broken flows after deploys without writing a single test script
  • Security audits — map all reachable states and transitions from an entry point

CLI Reference

Four commands, each with a one-liner example. Expand for full flag tables.

capture — Snapshot a single page

python cli.py capture https://example.com
Flags
Flag Default Description
--engine playwright Browser engine to use
--no-dismiss-overlays off Disable automatic overlay/popup dismissal
--dismiss-overlays-llm off Use Claude to detect overlays when heuristics fail

explore — Interactive step-by-step

Human-in-the-loop mode. The agent shows discovered actions, you pick which to take.

python cli.py explore https://example.com
Flags
Flag Default Description
--engine playwright Browser engine to use
--no-dismiss-overlays off Disable automatic overlay/popup dismissal
--dismiss-overlays-llm off Use Claude to detect overlays when heuristics fail

auto — Fully automated BFS discovery

The main command. Explores breadth-first with a live progress tree.

python cli.py auto https://example.com --depth 4 --branches 3 --mermaid
Flags
Flag Default Description
--depth 3 Max BFS depth
--branches 5 Max branches explored per page
--secondary off Also explore secondary-priority actions
--mermaid off Generate a Mermaid flowchart journey map
--output PATH Additional output path for results.json
--engine playwright Browser engine to use
--no-dismiss-overlays off Disable automatic overlay/popup dismissal
--dismiss-overlays-llm off Use Claude to detect overlays when heuristics fail
--resume RUN_DIR Resume an interrupted exploration from a previous run directory

replay — Re-execute saved journeys

No LLM needed. Replays journeys through a real browser and reports pass/fail per step. By default, every XHR/fetch request is captured and written to a network-summary.json per journey — useful for verifying analytics fires, monitoring third-party API calls, and catching broken endpoints after deploys.

# Replay all journeys for a site
python cli.py replay all example.com --headed

# Replay a single journey
python cli.py replay one output/example.com/auto_20260210_182142/journeys/journey-a-checkout.json
Flags
Flag Default Description
--headed off Run browser in headed (visible) mode
--engine playwright Browser engine to use
--output-dir output/ Custom output root for replay results
--wait 1000 Wait time in ms between steps
--viewport 1280x800 Viewport size as WIDTHxHEIGHT
--parallel N 1 Run up to N journeys concurrently
--dismiss-overlays off Enable overlay dismissal during replay
--dismiss-overlays-llm off Use Claude to detect overlays during replay
--step-retries N 0 Retry failed steps up to N times
--no-capture-network off Disable network (XHR/fetch) capture

Output Structure

output/
└── <domain>/
    ├── auto_YYYYMMDD_HHMMSS/       # Auto-exploration run
    │   ├── graph.json
    │   ├── results.json
    │   ├── journey_map.md
    │   ├── checkpoint.json          # Deleted on clean completion
    │   ├── screenshots/
    │   └── journeys/                # Replay-ready journey files
    │       ├── discovery-meta.json
    │       └── journey-a-*.json
    │
    ├── capture_YYYYMMDD_HHMMSS/     # Page capture
    │   ├── meta.json
    │   ├── elements.json
    │   ├── elements.tsv
    │   ├── a11y_tree.txt
    │   └── screenshot.png
    │
    ├── explore_YYYYMMDD_HHMMSS/     # Interactive exploration
    │   └── (screenshots + session artifacts)
    │
    └── replay_YYYYMMDD_HHMMSS/      # Replay results
        ├── replay-summary.json
        └── journey-a-*/
            ├── replay-result.json
            ├── network-summary.json
            └── step-01-*.png
Advanced Topics

Overlay Dismissal

Cookie consent banners, newsletter popups, chat widgets — the agent auto-dismisses them after every navigation. Two tiers:

Tier Method Cost
Heuristic Known CMP selectors (OneTrust, CookieBot, etc.), text-matched buttons, high-z-index close buttons Free
LLM-assisted Claude Haiku analyzes a screenshot to find dismiss targets ~$0.002/call

Heuristic is on by default for discovery, off for replay. LLM tier is opt-in via --dismiss-overlays-llm.

Error Recovery

  • LLM API calls: Retried with exponential backoff (up to 4 attempts) on rate limits and server errors
  • Page navigation: Retried twice on timeouts and transient network errors
  • Click execution: Retried twice on stale elements
  • CAPTCHA detection: Cloudflare, reCAPTCHA, hCaptcha, and Turnstile are detected and skipped
  • Redirect loop protection: Aborts after 10 redirects

Browser Engines

Engine --engine value Description
Playwright Chromium playwright Default headless Chromium
Camoufox camoufox Stealth Firefox (pip install camoufox)
Real Chrome real_chrome Persistent profile, always headed — retains cookies/logins

Checkpoint & Resume

During auto exploration, a checkpoint saves every 5 nodes. If interrupted:

python cli.py auto --resume output/example.com/auto_YYYYMMDD_HHMMSS/

Exploration continues exactly where it left off. The checkpoint is deleted on clean completion.

Deep Dive

The core of this agent is the Element Reference Pattern — a technique for grounding vision-language models in the DOM without asking them to generate selectors. It eliminates an entire class of failures that plague browser agents.

Read the full blog post →

Project Structure

src/
  analyzer/       # LLM-powered page analysis (screenshot + a11y → actions)
  browser/        # Browser engine factory, page capture, action execution
  cli/            # Rich live-progress display for the terminal
    commands/     # CLI command handlers (capture, explore, auto, replay)
    display.py    # Display helpers for exploration output
  llm/            # Anthropic API client wrapper
  models/         # Pydantic data models (actions, graph, journeys)
  orchestrator/   # BFS explorer and interactive explorer logic
  output/         # JSON/Mermaid export and journey persistence
  replay/         # Journey replay executor and output helpers
  utils/          # Logging setup and retry helpers

License

MIT