Settings

Theme

Ask HN: How do you search the web programmatically these days?

6 points by coreyp_1 18 days ago · 10 comments · 1 min read


For the first time in a long time, I need to query a search engine programmatically, and found that most of them block the use of curl, etc.

So, my question is simple: how do you solve the problem? I've tried searxng with mediocre success, but it seems a bit heavy to have to be running a complete separate service for this one thing that I only need every once in a while. I haven't tried using a service that requires an API key, simply because I'm not sure which direction to go or who to go with.

Just thought I would ask here first.

BrunoBernardino 17 days ago

I'm building Uruky [1] and while we allow you to query our service programmatically (€5 / month), if you know which provider you'd like to use directly, there are a few options:

- Serper [2], if you like Google-style results

- Mojeek [3], if your searches are more EU-centric

- Linkup [4], if you like Google-style results, but more about intent and less about keyword matching

- Marginalia [5], if your searches are less about "big tech SEO servants"

- EUSP [6], if your searches are more UK/FR/DE-centric

Note that these are all paid, but most offer free trials (or are limited when free). With Uruky you can also easily search with any or all of them. If you'd like an account number with a couple of days to try for free, let me know.

[1]: https://uruky.com

[2]: https://serper.dev

[3]: https://www.mojeek.com/services/search/web-search-api/

[4]: https://linkup.so

[5]: https://about.marginalia-search.com/article/api/

[6]: https://www.eu-searchperspective.com

  • freakynit 17 days ago

    Hi, do you folks maintain your own search index?

    • BrunoBernardino 17 days ago

      At Uruky we're currently building one, and expect it'll be available as a provider in a couple of months, but that ETA isn't certain just yet.

      What you can use, right now, is every provider's index (Mojeek, Marginalia, and EUSP are completely independent, AFAIK).

      • freakynit 15 days ago

        Thanks. That's a massive undertaking. Wishing you all the best..

        • BrunoBernardino 12 days ago

          Thanks, indeed it is! We've already started making it available and are running it with a couple of website owners at https://uruky.com/site-search, and there's so much to do! It should be included (includable) in Uruky once the main kinks have been resolved.

raw_anon_1111 18 days ago

Can’t speak for search engines specifically. But I recently had to do a project which required me to crawl the customer’s large site and index it into a vector search for RAG for a call center.

My first attempt was to use crawl it just by doing GET requests (ie same thing as using curl). That got me nowhere. I had to use headless Chrome and Playwright.

Do any modern websites work with just curl even if they don’t block it - ie without being able to run JS?

davidsojevic 18 days ago

I work at SerpApi [0], and we offer a free tier that may serve your needs if you're just looking to do programmatic searches periodically.

Much of the reason people go with a service like ours is because of the difficulty with rolling your own reliable solution. Happy to answer any questions you might have as well!

[0]: https://serpapi.com/

dserban 18 days ago

https://pypi.org/project/ddgs/

(Assuming you prefer Python.)

pwg 18 days ago

> and found that most of them block the use of curl

Try again, but have curl provide a user agent string from one of the real browsers. You'll likely find that the request goes through.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection