Settings

Theme

Show HN: Motie – Replit for Web Scraping

app.motie.dev

4 points by jb_hn 3 months ago · 4 comments · 2 min read

Reader

Hey HN, Justin here. We’re building Motie (https://app.motie.dev), an AI agent that extracts structured data from the web and generates web scraping code, using natural language.

We started building Motie a few months back with the goal of creating an “AI Data Engineer.” We took a ‘forward deployed engineer’-style approach to refine our scope (and to avoid "boiling the ocean”) and noticed that web extraction requests came up time and time again.

We also noticed that many existing tools required a lot of upfront work (defining schemas, specifying CSS selectors), while others offered data without providing the code to scrape it.

With this release, we hope to make it incredibly easy to scrape any website* while giving technical users code to build upon and less technical users an easy interface to extract the data they need.

Features

> Natural language-based extraction: simply provide a URL (https://news.ycombinator.com/) and a prompt (“Find the top 5 stories that have more than 100 points.”) > Full code ownership: all web scraping code can be exported > CSV and JSON output formats > Hosted scheduling and orchestration

Current Limitations

> This release does not include support for proxies. *Scraping websites like Amazon and eBay is thus not well supported at this time. (That said, we’ve noticed a very long tail of websites that don’t require proxies!)

We’ve tried to make getting started as easy and frictionless as possible (e.g., you can use Google or GitHub SSO), and we’d love to hear the HN community’s thoughts!

theanonymousone 3 months ago

> we’ve noticed a very long tail of websites that don’t require proxies

That tail seems to be getting harshly slaughtered by Cloudflare.

  • jb_hnOP 3 months ago

    Good point – we’ve definitely noticed a lot more Cloudflare representation these days. That said, there seems to be tiers in terms of the protection they offer (and thus the protection used by the websites in this long-tail), where lower tiers (so far) haven’t required proxies.

    Curious if you’ve noticed any particularly well defined, obscure websites? Would love to take a look if so.

xmcp123 3 months ago

Ya know, I was ready to downvote this (AI scraping is not my favorite) but I’m not going to.

It really does have its niche - one off complex scrapes where it’s kind of questionable if it’s worth writing a scraper.

  • jb_hnOP 3 months ago

    Haha I appreciate that! And that’s exactly right. Our goal is to make it so that you don’t have to ask the question “but is it worth the time and effort…” when you want to use or explore a new dataset.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection