Ask HN: Web scraping based on Computer Vision?

2 points by obl1que 3 years ago · 2 comments · 1 min read

Who's developing an approach to web scraping based on computer vision (CV)? I've looked for this but so far not found much beyond [0] -- although the motivations for this are also touched upon by [1].

Scraping is an arms race, of course. A simple but often successful way to fight scraping is, for instance, changing the names of classes routinely. This affects scrapers more than users, because a user doesn't see those class names, while a scraper relies on them.

Is anyone scraping using only (or mostly) computer vision on the rendered browser screen, and simulating mouse clicks and key presses?

It seems like anti-scraping measures to defeat an CV-based approach would be more intrusive to the user and thus they would be used less often.

[0] https://github.com/jimbobewenhall/OpenCV-website-scraper [1] https://incolumitas.com/2021/05/20/avoid-puppeteer-and-playwright-for-scraping/

obl1queOP 3 years ago

I think this may be what I was looking for:

https://www.askui.com/askui-vs-selenium/

Looks like it has been posted on HN several times, each with very little discussion.

pacarvalho 3 years ago

Maybe not even computer vision on the pixel level but instead ML on the DOM to notice when it has loaded enough to parse the content from it?

Settings

Ask HN: Web scraping based on Computer Vision?

Keyboard Shortcuts