Scraping Walmart Ecommerce Data With Cloudflare Browser Rendering

If you’ve ever tried to scrape ecommerce apps like walmart, you know the pain. It’s a React app. The HTML you get from a plain requests.get() is basically an empty shell — no products, no prices, nothing useful. The actual data only shows up after JavaScript runs, hydrates the page, and renders everything client-side.

So you need a real browser. But spinning up Puppeteer or Playwright locally is slow, fragile, and annoying to deploy. That’s where Cloudflare Browser Rendering comes in.

Instead of managing headless browsers yourself, you make an API call to Cloudflare. They spin up a browser instance on their edge network, navigate to the URL, wait for the page to fully render, and hand you back the HTML. It’s like having a browser-as-a-service.

The approach here is dead simple — two steps:

Fetch the fully rendered HTML via Cloudflare’s API
Parse the structured data out of it locally

No browser dependencies on your machine. No Selenium. No Docker containers running Chrome. Just HTTP requests and JSON parsing.

Step 1: Getting the HTML

Section titled “Step 1: Getting the HTML”

The first script hits Cloudflare’s /content endpoint. You give it a URL and some options, and it gives you back rendered HTML.

First, grab your Cloudflare credentials from environment variables (or a .env file):

import os
import json
import requests
from dotenv import load_dotenv
load_dotenv()
ACCOUNT_ID = os.environ.get("CF_ACCOUNT_ID")
API_TOKEN = os.environ.get("CF_API_TOKEN")
endpoint = f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/browser-rendering/content"

Now build the payload. The key detail is the waitForSelector option. Walmart’s product grid doesn’t appear instantly — React needs a moment to mount and render the components. By telling Cloudflare to wait for [data-testid='item-stack'] to appear in the DOM, we make sure the products have actually loaded before the HTML gets captured.

SEARCH_QUERY = "sunglasses"
payload = {
    "url": f"https://www.walmart.com/search?q={SEARCH_QUERY}",
    "gotoOptions": {
        "waitUntil": "domcontentloaded",
        "timeout": 60000
    },
    "waitForSelector": {
        "selector": "[data-testid='item-stack']",
        "timeout": 30000
    },
    "viewport": {
        "width": 1280,
        "height": 720
    }
}

Then fire it off and save the result:

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_TOKEN}"
}
resp = requests.post(endpoint, json=payload, headers=headers)
if resp.status_code == 200:
    with open("walmart_search_raw.html", "w", encoding="utf-8") as f:
        f.write(resp.text)
    print(f"Saved {len(resp.text):,} chars of rendered HTML")
else:
    print(f"Error {resp.status_code}: {resp.text[:500]}")

That’s the entire fetch step. One POST request, and you get back the full page HTML as if you’d opened it in Chrome and hit “View Source” after everything loaded.

Step 2: Parsing the Data

Section titled “Step 2: Parsing the Data”

Here’s where it gets interesting. You could parse the HTML with BeautifulSoup, find all the product cards, and extract text from each element. But there’s a much better way.

Walmart is built with Next.js, and Next.js apps embed all their page data in a <script id="__NEXT_DATA__"> tag. It’s a giant JSON blob containing everything the page needs to render — including all the product data in a clean, structured format.

So instead of wrestling with CSS selectors and fragile DOM traversal, you just grab that JSON blob directly:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
script = soup.find("script", id="__NEXT_DATA__")
next_data = json.loads(script.string)

From there, the product data lives at a predictable path:

search_result = next_data["props"]["pageProps"]["initialData"]["searchResult"]
item_stacks = search_result.get("itemStacks", [])

Now loop through the stacks and pull out what you need. One thing to watch for — not every item in a stack is an actual product. Walmart mixes in ads, placeholders, and other junk, so you need to filter:

products = []
for stack in item_stacks:
    for item in stack.get("items", []):
        # Skip non-product entries
        if item.get("__typename") not in ("Product", "SearchProduct"):
            if not item.get("usItemId"):
                continue
        price_info = item.get("priceInfo") or {}
        image_info = item.get("imageInfo") or {}
        product = {
            "name": item.get("name"),
            "brand": item.get("brand"),
            "usItemId": item.get("usItemId"),
            "url": "https://www.walmart.com" + item["canonicalUrl"]
                   if item.get("canonicalUrl") else None,
            "image": image_info.get("thumbnailUrl"),
            "price_current": price_info.get("linePrice"),
            "price_was": price_info.get("wasPrice"),
            "rating": item.get("averageRating"),
            "review_count": item.get("numberOfReviews"),
            "seller": item.get("sellerName"),
            "in_stock": not item.get("isOutOfStock", True),
            "is_sponsored": item.get("isSponsoredFlag", False),
        }
        products.append(product)

The same product can show up in multiple stacks (once in organic results, once as a sponsored listing), so deduplicate before saving:

seen = set()
unique = []
for p in products:
    uid = p["usItemId"]
    if uid and uid not in seen:
        seen.add(uid)
        unique.append(p)
with open("walmart_products.json", "w", encoding="utf-8") as f:
    json.dump(unique, f, indent=2, ensure_ascii=False)

Names, prices, ratings, review counts, seller info, availability, images — all neatly organized. No regex. No “find the third div inside the second span” nonsense.

Why This Approach Works Well

Section titled “Why This Approach Works Well”

It’s resilient. Walmart can redesign their entire UI and change every CSS class name, but as long as they’re using Next.js, the __NEXT_DATA__ structure stays consistent. You’re reading the same data source that React itself uses to render the page.

It’s clean. You get structured JSON instead of messy HTML. Prices come as actual values, not strings you need to strip dollar signs from. Ratings are numbers. URLs are relative paths you can easily make absolute.

It’s fast. The Cloudflare browser call takes 10-20 seconds (it’s rendering a full page), but the parsing step is nearly instant. And since the browser runs on Cloudflare’s infrastructure, you’re not burning your own CPU.

It’s simple to deploy. No browser binaries to install. No Playwright or Puppeteer dependencies. The fetch step is just an HTTP POST. The parse step only needs beautifulsoup4 and the standard library.

Things to Watch Out For

Section titled “Things to Watch Out For”

A few gotchas I ran into:

The response format can vary. Sometimes Cloudflare wraps the HTML in a JSON object with a result key, sometimes it returns raw HTML. Handle both:

try:
    with open("walmart_search_raw.html", encoding="utf-8") as f:
        wrapper = json.load(f)
    html = wrapper["result"]
except (json.JSONDecodeError, KeyError):
    with open("walmart_search_raw.html", encoding="utf-8") as f:
        html = f.read()

Cloudflare has usage limits. Browser Rendering isn’t free — you get a certain number of requests per month on the free tier. For bulk scraping, keep that in mind.
Timeouts need to be generous. Walmart’s page is heavy. The 60-second gotoOptions timeout and 30-second waitForSelector timeout aren’t arbitrary — shorter values will fail intermittently.

After running both scripts, you end up with a clean JSON file. Each product looks something like this:

{
  "name": "Ray-Ban RB2132 New Wayfarer Sunglasses",
  "brand": "Ray-Ban",
  "usItemId": "123456789",
  "url": "https://www.walmart.com/ip/...",
  "image": "https://i5.walmartimages.com/...",
  "price_current": "$129.99",
  "rating": 4.6,
  "review_count": 342,
  "seller": "Walmart.com",
  "in_stock": true,
  "is_sponsored": false
}

From here you can do whatever you want — feed it into a price tracker, build a comparison tool, run analytics, or just browse products without the ad clutter.

The combination of Cloudflare Browser Rendering and Next.js’s __NEXT_DATA__ is genuinely one of the cleanest scraping patterns I’ve come across. You offload the hard part (running a browser) to Cloudflare, and you get structured data for free because of how Next.js works.

It’s not going to work for every site — only Next.js apps have that convenient data blob. But for the ones that do, it beats traditional DOM scraping by a mile.