Settings

Theme

Show HN: Snitchmd – Cloudflare-protected URLs into clean Markdown via Docker

github.com

8 points by syabro a month ago · 2 comments · 1 min read

Reader

Shmauthor here. Built this for myself, putting it out in case it's useful.

Needed any URL as clean Markdown for LLM context — including Cloudflare/anti-bot sites. curl gets HTTP 403 on those, raw HTML is 80%+ nav noise eating context, paid SaaS (Firecrawl, Jina) wasn't an option for me.

It's a Docker wrapper around two existing OSS tools — CloakBrowser (stealth Chromium that passes Cloudflare) and rs-trafilatura (HTML → Markdown). No new scraper, just glue. Runs locally, my URLs stay on my box

Token reduction (raw curl HTML vs snitchmd, tiktoken cl100k_base):

- cloudflare.com/learning/bots — curl: HTTP 403 → snitchmd: 0.8k

- docs.docker.com/engine/install — 187k → 0.9k

- en.wikipedia.org/wiki/LLM — 222.7k → 29.7k

Heads up: passes Cloudflare, can't solve "click traffic lights" captchas (reCAPTCHA v2, hCaptcha)

MIT. Happy to answer questions

sc0rp10 a month ago

What's the difference with playwright?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection