Settings

Theme

Show HN: Cloakbits - Headless web scraping with bypass for anti-bot WAFs

8 points by proszkinasenne2 5 years ago · 5 comments · 1 min read


There is a growing number of companies offering anti-bot protection SaaS to protect websites from scraping by automated bots based on Puppeteer/Selenium. Most of them rely on browser properties such as headers, javascript properties (window., navigator.), behavior analysis, to build device/user fingerprints and match it against a database of "whitelisted" fingerprints (typical user behavior/settings/device props etc).

For the past few months, together with two other devs I have worked on a customized Puppeteer/Playwright scraping backend. It's essentially a drop-in replacement for default Chrome/FF binaries. We managed to successfully go through Coinbase, Amazon, Aliexpress login pages in headless mode without getting captcha, or any other verification. We are planning to roll out a beta version. If you are interested in getting beta access leave us details about your use case here: https://a90eq67iroz.typeform.com/to/FAkWnrtv

The motivation for our project is that open-source solutions such as puppeteer-extra-stealth cover only a small portion of what popular anti-bot software such as Akamai Bot Manager or Imperva use to detect and ban emulated browsers.

dryja 5 years ago

We regularly scrape competitor websites to get insights on product availability and pricing. However, one of the competitors installed a script that gets us "Pardon our interruption". I guess it's because of bot detection. Unfortunately, that puppeteer plugin doesn't make it any different. We overcome this by using Oxylabs service. It's pricey but as long as you don't mind paying extra bucks (& got low frequency of scraping) you can use it as an alternative.

jjgreen 5 years ago

This is an invitation to beta-test, not a question -- so why Ask HN?

  • proszkinasenne2OP 5 years ago

    @jjgreen I am genuinely interested what are the existing solutions and how people deal with the problem. This is why it's "Ask HN". If there is none and someone would be interested in using our tool, why creating two topics?

    • mtmail 5 years ago

      The question reads like you only ask to promote your solution. It's better to split it into the genuine question and later a Show HN.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection