Settings

Theme

Show HN: Async web scraping framework on top of Rust

github.com

2 points by yehors 17 days ago · 2 comments · 1 min read

Reader

Meet silkworm-rs: a fast, async web scraping framework for Python built on Rust components (rnet and scraper-rs). It features browser impersonation, typed spiders, and built-in pipelines (SQLite, CSV, Taskiq) without the boilerplate. With configurable concurrency and robust middleware, it’s designed for efficient, scalable crawlers.

yehorsOP 17 days ago

Also, it supports Free-threaded Python (`PYTHON_GIL=0` env-var).

My little test that extracts title's from webpages (spider https://github.com/BitingSnakes/silkworm/blob/main/examples/...):

- RPS with GIL = ~174 - RPS without GIL: ~242

yehorsOP 17 days ago

I've built https://github.com/RustedBytes/scraper-rs to parse HTML using Rust with CSS selectors and XPath expressions. This wrapper can be useful for others as well.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection