Phantom: Web Automation Without a Browser — Saad Naveed

Most web automation tools work by driving a browser. Puppeteer launches Chromium. Playwright spins up Firefox. Selenium opens whatever you’ve got. They click buttons, fill forms, wait for elements to render, and pray the selectors don’t break.

I took a different approach. I built a tool called Phantom that extracts your session from an already-open Chrome tab, then replays raw HTTP requests. No browser runtime. No headless instance. No DOM. Just the same requests Chrome would make, fired from a Python script.

The problem

Some web apps don’t have APIs. Gumroad doesn’t expose product uploads. CNN doesn’t have a public feed API for section-level stories. When you want to automate actions on these sites, you’re stuck with browser automation - which means managing browser lifecycles, handling JavaScript rendering, dealing with flaky selectors, and accepting 10x slower execution.

But here’s the thing: the browser is just an HTTP client with extra steps. When you load a CNN section page, your browser sends a GET request and receives HTML. The rendering, the ads, the JavaScript - none of that matters if you just want the article links. You just need the right cookies, the right headers, and the right request.

How it works

Phantom has two phases: auth and run.

Phase 1: Steal the session

You log into a site in Chrome normally. Then phantom connects to Chrome’s DevTools Protocol over WebSocket and extracts your cookies:

def extract_session(domain, port=9222):
    # Find a tab with the target domain open
    pages = get_cdp_pages(port)
    target = [p for p in pages if domain in p.get("url", "")][0]

    # Connect to the tab's debugger
    ws = websocket.create_connection(target["webSocketDebuggerUrl"])

    # Ask Chrome for all cookies on this domain
    ws.send(json.dumps({
        "id": 1,
        "method": "Network.getCookies",
        "params": {"urls": [f"https://{domain}"]}
    }))

    result = json.loads(ws.recv())
    cookies = result["result"]["cookies"]

    # Grab the CSRF token from the page's meta tag
    ws.send(json.dumps({
        "id": 2,
        "method": "Runtime.evaluate",
        "params": {"expression": "document.querySelector('meta[name=\"csrf-token\"]')?.content || ''"}
    }))

This requires Chrome to be running with --remote-debugging-port=9222. You get the full session - cookies, CSRF tokens, everything - without ever typing a password into a script.

The session gets saved to ~/.config/web-api-client/sessions/gumroad.com.json and reused across runs until it expires.

Phase 2: Replay requests

Each automation is a “flow” - a Python module that knows how to make the right sequence of HTTP requests for a specific action. Here’s a flow that scrapes CNN’s top stories by section:

# flows/cnn/stories.py
NAME = "stories"
DESCRIPTION = "Get CNN top stories by section"

def run(session, args):
    # Fetch the section page as plain HTML
    status, html = session.get(path, headers={"Accept": "text/html"})

    # Parse article links straight from the DOM structure
    pattern = re.compile(
        r'<a\s[^>]*?href="(/[^"]+)"[^>]*?data-link-type="article"[^>]*>'
        r'(.*?)</a>',
        re.DOTALL,
    )
    headline_pattern = re.compile(
        r'class="container__headline-text[^"]*"[^>]*>(.*?)</span>',
        re.DOTALL,
    )

    for match in pattern.finditer(html):
        path = match.group(1)
        inner = match.group(2)
        title = headline_pattern.search(inner).group(1)
        articles.append({"title": clean(title), "path": path})

No Puppeteer. No page.waitForSelector('.headline'). No page.evaluate(). Just one HTTP request and some regex.

The HTTP client spoofs Chrome’s fingerprint - User-Agent, sec-ch-ua headers, fetch metadata - so the server sees requests identical to what a real browser sends.

Multi-step flows

Some actions need multiple coordinated requests. Uploading a cover image on Gumroad is a three-step dance:

Presign: POST to Rails Active Storage to get a signed S3 upload URL
Upload: PUT the file directly to S3 with the presigned headers
Attach: POST back to Gumroad with the signed_id to link the upload to your product

Each step depends on values from the previous one. In Puppeteer, you’d be clicking upload buttons and intercepting network requests. In phantom, it’s three sequential HTTP calls with value extraction between them.

Parsing without a DOM

CNN doesn’t render articles server-side in a clean way. The HTML is a mess of nested divs, ad containers, and JavaScript bundles. But the article content is still in there - you just need to know where to look.

Phantom’s CNN reader extracts headlines, authors, dates, and body text using targeted regex patterns against the raw HTML:

def _parse_article(html):
    headline = _extract_first(html, r"<h1[^>]*>(.*?)</h1>")
    authors = _extract_all(html, r'class="byline__name[^"]*"[^>]*>(.*?)<')
    date = _extract_first(html, r'<time datetime="([^"]+)"')
    paragraphs = _extract_body(html)

    return {
        "headline": clean(headline),
        "authors": [clean(a) for a in authors],
        "date": date,
        "paragraphs": paragraphs,
    }

The body extraction is the interesting part. CNN injects JavaScript snippets, app download prompts, and navigation elements inside article__content. Phantom stops parsing when it hits these signals:

# Stop at non-article junk
if any(signal in text for signal in [
    "Download the CNN app",
    "getSiteLanguage",
    "document.querySelector",
    "window.",
]):
    break

Brittle? Sure. But it’s also trivial to fix when CNN changes their markup. One regex update vs rewiring a Playwright selector chain.

The fallback trick

Some sites rotate CSRF tokens aggressively. When a token goes stale mid-flow, phantom falls back to Chrome DevTools - not to drive the browser, but to navigate to the right page and scrape a fresh token from the DOM:

if "Unknown or expired link" in response:
    # Token expired - grab a fresh one from live Chrome
    ws = websocket.create_connection(pages[0]["webSocketDebuggerUrl"])

    # Navigate Chrome to the page
    ws.send(json.dumps({
        "id": 1,
        "method": "Page.navigate",
        "params": {"url": target_url}
    }))
    time.sleep(3)

    # Extract the fresh token from the rendered DOM
    ws.send(json.dumps({
        "id": 2,
        "method": "Runtime.evaluate",
        "params": {"expression": "document.querySelector('input[name=\"token\"]')?.value"}
    }))

    fresh_token = json.loads(ws.recv())["result"]["result"]["value"]
    # Retry with fresh token

HTTP replay for the fast path, Chrome automation for recovery. Best of both worlds.

Why not just use Playwright?

Phantom is fast - raw HTTP requests complete in milliseconds, not the seconds it takes to launch a browser, render a page, and wait for hydration. It’s deterministic - same request, same response, no timing-dependent failures. And it’s lightweight - the entire tool is stdlib Python plus one dependency (websocket-client for the optional Chrome integration).

But the real reason is simpler: I use phantom inside other tools. My news aggregator calls phantom run cnn:stories as a subprocess. My content pipeline calls phantom run gumroad:publish. These are fire-and-forget CLI calls. Spinning up a headless browser for each one would be absurd.

The tradeoffs

This approach doesn’t work for everything. If a site does heavy client-side rendering and you need to read the result, you need a browser. If actions trigger complex JavaScript (WebSocket upgrades, client-side encryption), HTTP replay won’t cut it.

But for the 80% of web automation that’s “fetch a page and parse the response” - you don’t need a browser. You need the right cookies and the right request.

Phantom is a private tool I use daily, but the technique is simple enough to build yourself. The core is ~200 lines of Python. Just feed this article into your favorite LLM to design it ;)