You write a scraper. It runs fine on your test pages. Then you point it at a real target and get a 403 before a single byte of HTML comes back.
That wall is an anti-bot system. Cloudflare alone sits in front of roughly a fifth of the web, and DataDome, Akamai, and Kasada cover most of what's left worth scraping.
I'll walk through six methods to bypass anti-bots, from a header fix you can ship in two minutes to defeating the protocol-level detection that breaks every Playwright fork.
Every method here is something you run and control. No black-box scraping API where you paste a URL and pray. You own the stack, so you can debug it when a target changes.
How anti-bot detection works in 2026
To bypass anti-bots in 2026, your scraper has to look like a real browser across four layers at once: the TLS handshake, the HTTP/2 frame order, the JavaScript fingerprint, and your behavior. Miss one layer and you're flagged, even if the other three are perfect. The trick is matching only the layers your target actually checks.
Here's what each layer inspects.
TLS fingerprint. The moment you open an HTTPS connection, your cipher suites and extension order form a hash (JA3, and now JA4). Python's requests produces a hash that screams "script."
HTTP/2 fingerprint. Real browsers send frames and pseudo-headers in a specific order. Default HTTP clients don't, so the request gets flagged before any header is even read.
JavaScript fingerprint. Once the page loads, scripts read navigator.webdriver, your canvas, WebGL renderer, audio context, and installed fonts. Headless Chrome leaks automation markers all over this layer.
Behavior. Real users don't pull 50 pages in 10 seconds from one IP. Rate, timing, and navigation order all feed a risk score.
JA4 arrived in 2023 and made fingerprinting harder to dodge. It sorts TLS extensions before hashing, which kills the old trick of randomizing extension order to slip past JA3.
The takeaway: there's no single switch. You match the layers a given target gates on, and nothing more.
6 methods to bypass anti-bots
Here's the full lineup, easiest to hardest. Start at the top and only escalate when you actually hit a wall.
| Method | Difficulty | Cost | Best for | Success rate |
|---|---|---|---|---|
| 1. Fix headers and request shape | Easy | Free | Light protection, internal APIs | Low–Medium |
| 2. Match the TLS fingerprint (curl_cffi) | Easy | Free | Network-layer gates, no JS needed | Medium–High |
| 3. Rotate residential proxies | Medium | $ | IP reputation and rate blocks | Medium–High |
| 4. Stealth browser (Camoufox / Patchright) | Medium | Free | JavaScript challenges, Turnstile | High |
| 5. Defeat CDP detection (nodriver) | Hard | Free | Targets that catch every automated browser | High |
| 6. Human behavior + session warming | Hard | Free | Behavioral scoring at scale | High |
Quick recommendation: if the data shows up in the raw HTML, start with method 2. If the page needs JavaScript to render, jump to method 4.
Basic methods (start here)
1. Fix your headers and request shape
The cheapest win. Most blocked-on-day-one scrapers are sending a header set no browser would ever produce.
Best for: Light protection, internal JSON APIs Difficulty: Easy Cost: Free Success rate against anti-bots: Low–Medium
How it works
A default requests call sends almost no headers and a dead-giveaway User-Agent. You want the full set a real Chrome session sends, in roughly the right order.
Implementation
Send the headers a browser actually sends, including Accept, Accept-Language, and Sec-Fetch-*.
import requests
# A realistic Chrome header set, not just a spoofed User-Agent
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/138.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Site": "none", # set "same-origin" once you have a referrer
"Sec-Fetch-Mode": "navigate",
"Connection": "keep-alive",
}
resp = requests.get("https://example.com", headers=headers, timeout=20)
print(resp.status_code)
Watch for one thing: this fixes the application layer but not the TLS handshake. A site that fingerprints TLS will still block you here, which is exactly what method 2 fixes.
Pros and cons
Pros:
- Two-minute change, works on a surprising number of sites
- No new dependencies
- Fast and cheap to run
Cons:
- Useless against any system that checks TLS or JS
- Headers alone are a weak signal in 2026
Use this when the target is lightly protected or you're hitting an undocumented internal API. Skip it the moment you see a Cloudflare challenge page.
2. Match the TLS fingerprint with curl_cffi
This is where most people should actually start. It fixes the layer that gets you blocked before your headers are even read.
Best for: Network-layer gates where the data lives in raw HTML
Difficulty: Easy
Cost: Free
Success rate against anti-bots: Medium–High
How it works
curl_cffi wraps curl-impersonate and replicates a real browser's TLS handshake, JA3/JA4 hash, cipher order, and HTTP/2 frame order. The API mirrors requests, so migration is nearly copy-paste.
Implementation
Install with pip install curl_cffi, then pass impersonate to copy a real Chrome fingerprint.
from curl_cffi import requests
# impersonate="chrome" copies the latest Chrome TLS + HTTP/2 fingerprint
resp = requests.get(
"https://example.com",
impersonate="chrome",
timeout=20,
)
print(resp.status_code)
print(resp.text[:300])
Notice you didn't touch headers. curl_cffi handles the full network signature, which is the part requests can never get right.
It also keeps sessions, so cookies and connection reuse survive across requests.
from curl_cffi import requests
session = requests.Session()
session.get("https://example.com/login", impersonate="chrome")
# cookies from the first call carry into the second
resp = session.get("https://example.com/dashboard", impersonate="chrome")
print(resp.status_code)
The catch: this still can't run JavaScript. If the target gates on a JS challenge or renders content client-side, you'll get the challenge HTML, not your data. That's the signal to move up to a browser.
Pros and cons
Pros:
- Beats TLS and HTTP/2 fingerprinting, the two layers headers can't fix
- Almost as fast as plain
requests - Drop-in replacement, so existing code barely changes
Cons:
- No JavaScript execution
- Fingerprints lag new browser releases by a version or two
Use curl_cffi for any target whose data is in the initial HTML response. For a deeper walkthrough, see our guide to web scraping in Python.
3. Rotate residential proxies
A perfect fingerprint won't save you if every request comes from one datacenter IP. IP reputation is its own detection layer.
Best for: IP bans, rate limiting, geo-gated content
Difficulty: Medium
Cost: $ (proxy bandwidth)
Success rate against anti-bots: Medium–High
How it works
Anti-bots score IPs by ASN and request rate. Datacenter ranges carry a poor reputation; residential IPs look like ordinary home connections. Rotating across a pool spreads your requests so no single IP trips a rate threshold.
Implementation
Point any client at a rotating endpoint. Here it is with curl_cffi from method 2.
from curl_cffi import requests
# a rotating residential endpoint hands you a fresh IP per request
proxies = {
"http": "http://user:pass@gate.roundproxies.com:8000",
"https": "http://user:pass@gate.roundproxies.com:8000",
}
resp = requests.get(
"https://example.com",
impersonate="chrome",
proxies=proxies,
timeout=30,
)
print(resp.status_code)
I run a rotating residential pool for anything past a few hundred requests, since datacenter IPs get burned fast on protected targets. Roundproxies is what I reach for, but any reputable residential network works the same way.
One gotcha: rotating too aggressively can hurt you. If a site ties a session to one IP, jumping IPs mid-session looks broken. Pin a "sticky" session for stateful flows, rotate for stateless ones.
Pros and cons
Pros:
- Solves IP bans and rate limits that no fingerprint fix can touch
- Lets you parallelize across many IPs
- Works with every other method here
Cons:
- Costs money per gigabyte
- Sticky vs. rotating is a decision you have to get right
Use residential proxies once you're scraping at volume or seeing 429s. Pair them with method 2 or 4. More detail in our proxy rotating and residential proxies explainer.
4. Use a stealth browser for JavaScript challenges
When a target runs a JS challenge or renders content client-side, you need a real browser that doesn't leak automation markers. In 2026, that means Camoufox or Patchright, not the old stealth plugins.
Best for: JavaScript challenges, Turnstile, client-rendered pages
Difficulty: Medium
Cost: Free
Success rate against anti-bots: High
How it works
Camoufox is a patched Firefox built for scraping. It spoofs canvas, WebGL, fonts, and screen properties, and adds human-like cursor movement. Firefox helps here because most anti-bots tune their hardest checks for Chromium.
A quick deprecation note, because the old advice is everywhere: puppeteer-extra-plugin-stealth was deprecated in February 2025 and current Cloudflare checks detect it. undetected-chromedriver is in the same boat. Don't waste a day on either in 2026.
Implementation
Install with pip install camoufox[geoip], then run python -m camoufox fetch to pull the browser.
from camoufox.sync_api import Camoufox
# humanize adds realistic cursor motion; os spoofs a macOS fingerprint
with Camoufox(headless=True, humanize=True, os="macos") as browser:
page = browser.new_page()
page.goto("https://example.com", timeout=30000)
page.wait_for_load_state("networkidle") # let the challenge resolve
html = page.content()
print(len(html))
The API is just Playwright, so any Playwright code you already have ports over by swapping the launch line.
For a Cloudflare Turnstile checkbox, you need cross-origin iframes to be clickable, which disable_coop=True handles.
from camoufox.sync_api import Camoufox
with Camoufox(disable_coop=True, humanize=True, window=(1280, 720)) as browser:
page = browser.new_page()
page.goto("https://example.com")
page.wait_for_load_state("networkidle")
page.wait_for_timeout(4000) # give Turnstile time to settle
page.mouse.click(210, 290) # click the checkbox at its rendered spot
page.wait_for_timeout(3000)
If you'd rather stay on Chromium, Patchright is the maintained patched-Playwright option. Install it (pip install patchright), run patchright install chromium, and launch with channel="chrome" to use your real Chrome build.
Pros and cons
Pros:
- Runs JavaScript, so client-rendered sites work
- Beats most JS fingerprinting out of the box
- Handles many Turnstile and JS challenges unattended
Cons:
- Slow and memory-hungry next to
curl_cffi - Each browser instance eats real RAM, so scaling costs you
Use a stealth browser when the data only exists after JavaScript runs, or when method 2 returns a challenge page. For Cloudflare specifically, see our Cloudflare bypass guide.
Advanced methods (for tough cases)
5. Defeat automation-protocol detection with nodriver
Here's the wall that trips up almost everyone: some targets don't fingerprint your TLS or your JS. They detect the automation protocol controlling the browser. Every Playwright fork, Camoufox and Patchright included, fails this check, because the detection looks at how the browser is being driven.
Best for: Targets that block every automated browser but work fine when you click manually
Difficulty: Hard
Cost: Free
Success rate against anti-bots: High
How it works
Playwright and Selenium expose traces of the DevTools Protocol they ride on. nodriver is the successor to undetected-chromedriver and uses its own custom DevTools implementation, so it isn't driven through the standard automation interface that detectors look for.
An independent 2026 benchmark of seven stealth tools across dozens of Cloudflare targets found the same split: TLS and JS layers fall to Camoufox and curl_cffi, but automation-protocol targets are a cliff where Playwright forks fail regardless of how well they're patched (Paterson, 2026).
Implementation
Install with pip install nodriver. It's async and needs only Python plus a Chrome-based browser.
import nodriver as uc
async def main():
# headful (headless=False) passes more checks than headless
browser = await uc.start(headless=False)
page = await browser.get("https://example.com")
await page.select("h1") # waits until the element appears
html = await page.get_content()
print(len(html))
browser.stop()
uc.loop().run_until_complete(main())
Notice there's no Selenium and no ChromeDriver. nodriver talks to Chrome directly, which is the whole reason it slips past automation-protocol checks.
To route it through proxies, start the browser with a proxy argument.
import nodriver as uc
async def main():
browser = await uc.start(
headless=False,
browser_args=["--proxy-server=http://gate.roundproxies.com:8000"],
)
page = await browser.get("https://example.com")
print(await page.get_content()[:200])
browser.stop()
uc.loop().run_until_complete(main())
The tradeoff: nodriver is younger than Playwright, so some conveniences are still maturing and the docs are thin. You trade polish for the one thing it does that nothing else free does well.
Pros and cons
Pros:
- Passes automation-protocol detection that breaks every Playwright fork
- No Selenium, no ChromeDriver, minimal setup
- Async by default, so it scales across tabs
Cons:
- Less mature than Playwright; expect rough edges
- Smaller community, sparser examples
Use nodriver only when a target blocks your stealth browser but works fine in a normal manual session. That asymmetry is the fingerprint of automation-protocol detection.
6. Add human behavior and session warming
You can have a flawless fingerprint and still get blocked on request number 40. Behavior is the last layer, and it's the one people forget.
Best for: Behavioral scoring, scraping at volume
Difficulty: Hard
Cost: Free
Success rate against anti-bots: High
How it works
The pattern that breaks most scrapers is treating every request as stateless. Real users land on a homepage, pick up cookies, then navigate. They pause. They don't fire identical requests on a metronome.
Session warming means visiting the homepage first so you carry a real referrer and cookie set into your target page.
Implementation
Warm the session, then add randomized pacing between requests.
import random
import time
from curl_cffi import requests
session = requests.Session()
# 1. land on the homepage to collect cookies and a referrer
session.get("https://example.com/", impersonate="chrome")
time.sleep(random.uniform(2, 5))
# 2. now hit the real target like a returning visitor
for page_num in range(1, 6):
url = f"https://example.com/listings?page={page_num}"
resp = session.get(url, impersonate="chrome")
print(page_num, resp.status_code)
time.sleep(random.uniform(3, 8)) # human-ish gap, never fixed
The key is random.uniform, not a constant sleep. Fixed delays are their own pattern, and detectors catch the metronome fast.
In a browser, the humanize=True flag from method 4 covers cursor movement; you still add the navigation pacing yourself.
Pros and cons
Pros:
- Stops the slow blocks that fire after N requests
- Costs nothing, just wall-clock time
- Stacks on top of every other method
Cons:
- Slows your throughput on purpose
- Tuning the right delay is trial and error per target
Use behavioral pacing on anything you scrape at volume. It's the difference between a scraper that dies in an hour and one that runs for weeks.
Which method should you use to bypass anti-bots?
The mistake is grabbing the heaviest tool first. A headless browser to scrape a static JSON endpoint is slow, fragile, and overkill. Match the method to the layer your target actually checks.
This is the table I wish every other guide led with. Find your symptom on the left, read across to the fix.
| Detection layer | What it checks | How you know you hit it | Tool that beats it |
|---|---|---|---|
| TLS fingerprint | Cipher and extension order in the handshake (JA3/JA4) | 403 before any HTML loads | curl_cffi (impersonate), Camoufox |
| HTTP/2 fingerprint | Frame and pseudo-header order | 403 even with perfect headers | curl_cffi, any real browser |
| JavaScript fingerprint | navigator.webdriver, canvas, WebGL, fonts |
Challenge page or block after the page loads | Camoufox, Patchright |
| Automation protocol | Traces of DevTools driving the browser | Block on automation, fine when you click manually | nodriver |
| IP reputation | Datacenter ASN, request rate | 429, or a block that clears on your home IP | Residential proxies |
| Behavior | Timing, mouse, navigation order | Block after a clean run of N requests | humanize + delays + warming |
Here's the decision path I run on a new target.
Does the data show up in the raw HTML?
├── Yes → curl_cffi (impersonate="chrome") + residential proxies
│ Blocked? → it checks JS even for data. Go to a browser.
└── No (needs JavaScript) → Does it block on the very first load?
├── Yes → TLS/JS fingerprint gate → Camoufox or Patchright
└── Only after a few requests → behavior/IP gate
→ add proxies, warming, and random delays
Still blocked on automation but fine when you click by hand?
└── automation-protocol detection → switch to nodriver
Start at the cheapest method that covers your layer. Escalate only when you've confirmed the simpler one fails.
Troubleshooting common issues
"403 Forbidden" on the very first request
What it means: You were blocked at the network layer, before headers mattered. This is almost always TLS fingerprinting.
How to fix it: Swap requests for curl_cffi with impersonate="chrome" (method 2). If you're already using it, move to a real browser; the target is checking JS too.
"429 Too Many Requests"
What it means: Rate limit. Your fingerprint is probably fine, but you're hitting one IP too hard.
How to fix it: Add residential proxy rotation (method 3) and randomized delays (method 6). Back off exponentially on repeated 429s instead of hammering.
Stuck in a challenge loop
What it means: The JS challenge runs but never clears, usually because cookies aren't persisting or the page isn't fully loading.
How to fix it: Use a session so cookies carry over, and wait_for_load_state("networkidle") so the challenge finishes before you read the HTML. In Camoufox, give Turnstile a few extra seconds.
Blocked only when automated, fine in a manual browser
What it means: Automation-protocol detection. Your fingerprint is good, but the way the browser is driven gives you away.
How to fix it: Switch to nodriver (method 5). This is the one symptom that points to exactly one fix.
General debugging tips
Test against a fingerprint checker like tls.browserleaks.com to see what you're actually sending. Change one variable at a time. And watch for pattern changes; targets update detection constantly, so a method that worked last month can quietly break.
A note on responsible use
Bypassing an anti-bot doesn't make scraping legal or ethical by default. Before you run any of this at scale, think it through.
Most sites prohibit scraping in their terms of service, and bypassing protection can run into the Computer Fraud and Abuse Act in the US or GDPR rules in the EU. None of this is legal advice; check your own situation.
Scrape public data for legitimate purposes. Respect rate limits even when you can blow past them, cache aggressively to cut requests, and stay off personal data and government, health, or financial systems. If a site offers an API, use it; it's faster and you won't be playing this game at all.
Bypassing anti-bots in 2026: quick reference
Anti-bots are good, but they're not unbeatable. The whole game is matching your method to the layer your target checks, and nothing heavier
| Situation | Start with |
|---|---|
| Data in raw HTML, light protection | Method 1, then 2 |
| Network-layer block, no JS needed | Method 2 + 3 |
| JavaScript challenge or Turnstile | Method 4 |
| Blocks every automated browser | Method 5 |
| Slow blocks at volume | Method 6 |
My honest default: curl_cffi first, because it's fast and beats the layer most people get blocked on. Add residential proxies the moment you scale. Reach for a browser only when JavaScript forces it, and reach for nodriver only when a browser still gets caught.
Pick the lightest thing that works. Your future self, debugging this at 2am when a target changes its detection, will thank you.
This article was originally published in April 2025, written by Marius Bernard. It was most recently updated in June 2026.
Get productivity tips delivered straight to your inbox
We'll email you 1-3 times per week—and never share your information.
Related from Knowledge Base