Settings

Theme

Ask HN: Python library for robust URL retrieval with workaround strategies?

1 points by myyke 2 years ago · 1 comment · 1 min read


Background: I'm scraping various URLs, but (as expected) encounter issues with some servers blocking the scrapes, leading to errors like timeouts or 403 forbidden responses. Currently, I'm using the requests library, but for problematic URLs, I've noticed switching user agents or using different tools like pycurl or wget can sometimes bypass these blocks.

Question: Is there a Python library that automates these workaround strategies, attempting multiple methods to successfully retrieve a URL?

bashonly 2 years ago

https://github.com/yifeikong/curl_cffi

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection