Ask HN: Python library for robust URL retrieval with workaround strategies?

1 points by myyke 2 years ago · 1 comment · 1 min read

Background: I'm scraping various URLs, but (as expected) encounter issues with some servers blocking the scrapes, leading to errors like timeouts or 403 forbidden responses. Currently, I'm using the requests library, but for problematic URLs, I've noticed switching user agents or using different tools like pycurl or wget can sometimes bypass these blocks.

Question: Is there a Python library that automates these workaround strategies, attempting multiple methods to successfully retrieve a URL?

bashonly 2 years ago

https://github.com/yifeikong/curl_cffi

Settings

Ask HN: Python library for robust URL retrieval with workaround strategies?

Keyboard Shortcuts