GitHub - brcrusoe72/agent-search: Self-hosted search API + MCP server for AI agents. Bundles SearXNG. Zero API keys, one-command deploy. Open-source alternative to Tavily, Exa, and Serper.

10 min read Original article β†—

πŸ” AgentSearch

Quick Start Β· Features Β· API Β· MCP Server Β· Architecture Β· FAQ



⚑ Quick Start

Three commands. One endpoint. No API keys, no quotas, no vendor lock-in.

git clone https://github.com/brcrusoe72/agent-search.git
cd agent-search
docker compose up -d
curl "http://localhost:3939/search?q=distributed+consensus+algorithms&count=5"

That's it. You now have a deduplicated, multi-engine, LLM-ready search API running at http://localhost:3939.

Terminal Demo

For a reproducible terminal GIF workflow using the real AgentSearch quick-start commands, see docs/TERMINAL_GIF_GUIDE.md and the tapes in docs/demo/.

🎯 Why AgentSearch?

You could call SearXNG directly. Most people building serious agent infrastructure end up writing this layer anyway. AgentSearch is that layer, already built.

🧠 LLM-Native Output

Structured JSON with typed fields, scores, and metadata. No HTML scraping, no regex, no post-processing. Drop it straight into your agent's context.

🎯 Cross-Engine Scoring

Results are deduplicated and ranked by how many engines agree. Position 1 is position 1 for a reason β€” not an artifact of one engine's bias.

πŸ”— 9-Strategy Kill Chain

The /read endpoint cascades through direct fetch, readability parsing, UA rotation, Wayback, Google Cache, and more. Most stubborn URLs resolve.

🌊 Deep Search

/search/deep generates query variations, runs them in parallel, and fuses the rankings. Better recall on ambiguous or broad queries.

πŸ›‘οΈ Production-Ready

In-memory cache, per-IP + global rate limits, bearer-token auth, health checks. Ship it behind a reverse proxy and sleep well.

πŸ”Œ MCP Server Included

Plug directly into Claude Desktop, Cursor, or Windsurf. Six tools exposed over stdio β€” search, read, news, jobs, and more.

πŸ“Š How It Compares

AgentSearch Tavily Exa SerpAPI Google CSE
Cost Your infra only $0.005/query $0.003/query $50/mo $5/1K queries
API key required Optional βœ… βœ… βœ… βœ…
Setup docker compose up Sign up Sign up Sign up Console + billing
Engines 70+ via SearXNG Tavily only Exa only Google only Google only
Self-hosted βœ… ❌ ❌ ❌ ❌
Content extraction 9-strategy kill chain Basic Built-in ❌ ❌
Query expansion βœ… Partial ❌ ❌ ❌
MCP server βœ… Included Third-party Third-party ❌ ❌
Cross-engine scoring βœ… N/A N/A ❌ ❌
Data ownership 100% yours Vendor Vendor Vendor Vendor

Translation: if you're running more than ~10K queries/month against Tavily or Exa, AgentSearch pays for itself the first month. If you're processing sensitive queries, it's the only option that doesn't leak them to a third party.

🧩 Core Features

Deduplication & Cross-Engine Scoring

Every query hits multiple engines. Results are fingerprinted, deduplicated, and scored by cross-engine agreement. A result that surfaces on Google and Bing and DuckDuckGo ranks higher than one that only appears on one.

curl "http://localhost:3939/search?q=python+async+patterns&count=5"
{
  "results": [
    {
      "title": "Async IO in Python: A Complete Walkthrough",
      "url": "https://realpython.com/async-io-python/",
      "snippet": "A comprehensive guide to async/await in Python 3...",
      "engines": ["google", "bing", "duckduckgo"],
      "score": 1.0,
      "position": 1
    }
  ],
  "meta": {
    "query": "python async patterns",
    "total": 5,
    "engines_used": ["google", "bing", "duckduckgo"],
    "cached": false,
    "response_time_ms": 842.3
  }
}

The 9-Strategy Kill Chain

Content extraction is the silent killer of most RAG pipelines. /read doesn't give up on the first failure β€” it cascades through nine strategies, each tuned for a different class of stubborn URL.

flowchart TD
    Start([URL Request]) --> S1{1. Direct Fetch}
    S1 -->|βœ“| Done([Return Content])
    S1 -->|βœ—| S2{2. Readability Parse}
    S2 -->|βœ“| Done
    S2 -->|βœ—| S3{3. UA Rotation}
    S3 -->|βœ“| Done
    S3 -->|βœ—| S4{4. JS-Rendered Fallback}
    S4 -->|βœ“| Done
    S4 -->|βœ—| S5{5. Wayback Machine}
    S5 -->|βœ“| Done
    S5 -->|βœ—| S6{6. Google Cache}
    S6 -->|βœ“| Done
    S6 -->|βœ—| S7[7–9. Additional Fallbacks]
    S7 -->|βœ“| Done
    S7 -->|βœ—| Report[Report to /adapt/report]
    Report --> Loop[Self-improvement loop<br/>re-orders the chain]

    style Done fill:#10b981,stroke:#065f46,color:#fff
    style Report fill:#f59e0b,stroke:#92400e,color:#fff
    style Loop fill:#8b5cf6,stroke:#5b21b6,color:#fff
Loading

Most URLs resolve on strategies 1–3. The chain exists for the rest.


Deep Search with Query Expansion

Ambiguous or underspecified queries are the norm in agent workflows. Deep search generates 3–5 variations, runs them all, deduplicates across result sets, and returns a fused ranking.

curl "http://localhost:3939/search/deep?q=best+practices+for+llm+caching&count=10"

Expands to: "LLM response caching strategies", "semantic cache for language models", "prompt caching best practices", and similar β€” then merges the top results.


MCP: Claude Desktop, Cursor, Windsurf

Six tools exposed over stdio: search, deep_search, read_url, read_batch, news, search_jobs. Plug it into any MCP-compatible client and your agent can reach the open web without custom tool code.

{
  "mcpServers": {
    "agent-search": {
      "command": "python",
      "args": ["/path/to/agent-search/mcp-server/server.py"]
    }
  }
}

Production Essentials, Built In

🚦 Rate limiting Per-IP and global, configurable via env vars
πŸ”’ Bearer token auth Optional, applies to everything except /health
πŸ’Ύ In-memory caching Default 1-hour TTL, configurable
πŸ₯ Health checks Container status + upstream SearXNG connectivity
πŸ” Self-improvement loop Tracks extraction failures, re-orders kill chain deterministically

πŸ“‘ API Reference

Endpoint Method What It Does
/search GET Web search with deduplication and multi-engine scoring
/search/deep GET Multi-query fusion β€” generates variations, merges results
/search/extract GET Search + extract page content from top results in one call
/search/jobs GET Job board search (via SearXNG job engines)
/search/stats GET Query statistics and usage metrics
/read GET Extract readable content from a single URL (9-strategy kill chain)
/read/batch POST Batch extract up to 20 URLs concurrently
/news GET Structured news from Google News, Bing News, and friends
/adapt/report POST Report extraction failures (feeds the self-improvement loop)
/adapt/stats GET View adaptation metrics
/adapt/evolve POST Trigger self-improvement analysis
/health GET Health check
/engines GET List available search engines and their status
More example calls

Search + extract in one round-trip

curl "http://localhost:3939/search?q=rust+error+handling&count=3&fetch=true"

Read a single URL

curl "http://localhost:3939/read?url=https://example.com/some-article"

Structured news

curl "http://localhost:3939/news?q=ai+regulation&count=5"

Job search

curl "http://localhost:3939/search/jobs?q=senior+python+engineer&location=remote"

Batch extraction

curl -X POST "http://localhost:3939/read/batch" \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com/a", "https://example.com/b"]}'

🐍 Clients & Integrations

Python Client

pip install agentsearch-client
from agentsearch import AgentSearch

client = AgentSearch()  # defaults to localhost:3939
results = client.search("distributed systems consensus algorithms")

for r in results:
    print(f"{r.title} β€” {r.url}")

LangChain

from langchain.tools import tool
import requests

@tool
def web_search(query: str) -> str:
    """Search the web using AgentSearch."""
    resp = requests.get(
        "http://localhost:3939/search",
        params={"q": query, "count": 5}
    )
    results = resp.json()["results"]
    return "\n".join(
        f"- {r['title']}: {r['url']}\n  {r['snippet']}"
        for r in results
    )

MCP Server

pip install mcp httpx
python mcp-server/server.py

See mcp-server/README.md for remote setup, custom ports, and troubleshooting.

πŸ—οΈ Architecture

flowchart LR
    subgraph Clients["πŸ§‘β€πŸ’» Clients"]
        Agent["Your Agent<br/>any LLM"]
        MCP["MCP Clients<br/>Claude Β· Cursor Β· Windsurf"]
    end

    subgraph Core["βš™οΈ AgentSearch β€” :3939"]
        API["FastAPI<br/>Dedup Β· Scoring Β· Cache<br/>Rate limits Β· Auth Β· Kill chain"]
        MCPServer["MCP Server<br/>(stdio)"]
    end

    subgraph Upstream["πŸ”Ž Search Layer"]
        SXNG["SearXNG<br/>:8080"]
        Engines["Google Β· Bing Β· DuckDuckGo<br/>Brave Β· Startpage Β· Wikipedia<br/>70+ engines"]
    end

    Agent <-->|HTTP/JSON| API
    MCP <-->|stdio| MCPServer
    MCPServer <--> API
    API <-->|HTTP| SXNG
    SXNG <--> Engines

    style API fill:#3b82f6,stroke:#1e40af,color:#fff
    style SXNG fill:#8b5cf6,stroke:#5b21b6,color:#fff
    style MCPServer fill:#10b981,stroke:#065f46,color:#fff
Loading

The heavy lifting β€” deduplication, cross-engine scoring, kill-chain extraction, query expansion, caching, auth, self-improvement β€” happens in the middle layer. SearXNG handles engine rotation and upstream rate limiting. Your agent just gets clean JSON.

πŸ”§ Configuration

Environment Variables

Variable Default Description
SEARXNG_URL http://searxng:8080 SearXNG instance URL
CACHE_TTL 3600 Cache duration (seconds)
RATE_LIMIT 60 Max requests per minute per IP
GLOBAL_RATE_LIMIT 300 Max requests per minute across all IPs
AGENT_SEARCH_TOKEN (empty) Set to require Bearer <token> auth

Search Engines

Edit searxng/settings.yml to enable/disable engines, then restart:

docker compose restart searxng

Running Without Docker

pip install -r requirements.txt
SEARXNG_URL=http://localhost:8080 uvicorn app.main:app --reload --port 3939

Requires a SearXNG instance running separately.

πŸš€ Running in Production

Things that are easy to forget until they bite:

  1. Set AGENT_SEARCH_TOKEN. The default docker-compose binds to 127.0.0.1:3939, but the moment you put a reverse proxy in front, you need auth.
  2. Tune rate limits for your traffic shape. RATE_LIMIT=60 per IP is conservative. Bump GLOBAL_RATE_LIMIT first β€” it protects upstream engines.
  3. Enable more engines. More engines = better cross-engine scoring and better rate-limit headroom. SearXNG rotates automatically.
  4. Watch /adapt/stats. If a site consistently fails the kill chain, the self-improvement loop will re-rank strategies. Let it cook.
  5. Cache aggressively. Default TTL is 1 hour. For research-style workloads, 6–24 hours is reasonable. For news, drop it to 5 minutes.

❓ FAQ

How is this different from Perplexica?

Perplexica is an AI-powered search interface β€” it interprets your question and generates an answer. AgentSearch is an API backend β€” it returns structured results, extracted content, and metadata for your agent to reason over. Different layers of the stack.

Does the job search actually scrape LinkedIn/Indeed?

It searches through SearXNG engines that index job boards. It doesn't log into those sites or bypass their APIs. Result quality depends on which engines you enable and how those sites expose their listings to search engines. Set expectations accordingly.

What about rate limiting from upstream engines?

SearXNG rotates across engines and handles rate limiting internally. AgentSearch adds its own caching layer (default 1-hour TTL) so repeated queries don't hit upstream at all. In practice, moderate usage (a few hundred queries/day) runs fine. For heavy automation, enable more engines to spread the load.

Is the self-improvement loop (/adapt/evolve) using an LLM?

No. It's deterministic β€” it tracks which URLs fail extraction, which strategies succeed, and adjusts the kill chain ordering based on observed patterns. No API calls, no model inference, no costs.

Can I expose this to the internet?

You can, but set AGENT_SEARCH_TOKEN first. The default docker-compose binds to 127.0.0.1:3939 (localhost only). If you put it behind a reverse proxy, use the token auth and keep rate limits tight.

What's the kill chain?

A sequence of 9 content extraction strategies tried in order: direct HTTP fetch, readability parsing, user-agent rotation, JavaScript-rendered fallback, Wayback Machine, Google Cache, and several others. If strategy 1 fails, it tries strategy 2, and so on. Most URLs resolve on strategies 1–3. The chain exists for the stubborn ones.

Why not just use Tavily or Exa?

Go for it, if the pricing works for your volume and you're comfortable sending every query to a third party. AgentSearch exists for the cases where those constraints matter: cost at scale, data sensitivity, custom engine mixes, or simply wanting to own your infra.

πŸ—ΊοΈ Roadmap

  • Semantic re-ranking layer (optional, BYO embedding model)
  • Redis-backed cache for multi-instance deployments
  • Additional kill chain strategies (headless browser pool, Archive.today)
  • Prometheus metrics endpoint
  • Async Python client
  • Postgres-backed adaptation store (currently in-memory)

Have ideas? Open an issue or drop a PR.

🀝 Contributing

# 1. Fork it
# 2. Create your branch
git checkout -b feature/better-dedup

# 3. Commit
git commit -am 'Improve dedup algorithm'

# 4. Push
git push origin feature/better-dedup

# 5. Open a PR

Bug reports, feature requests, and documentation improvements are all welcome. For larger changes, open an issue first so we can discuss scope.

πŸ™ Acknowledgments

Built on the shoulders of SearXNG β€” a privacy-respecting metasearch engine that does the hard work of engine rotation and result federation. AgentSearch wouldn't exist without it.

Also inspired by the broader ecosystem of agent infrastructure tooling: LangChain, Model Context Protocol, Ollama, and the many others proving that self-hosted is not only viable, but often better.

πŸ“„ License

MIT β€” do whatever you want with it. See LICENSE for details.


If AgentSearch saves you an afternoon, consider ⭐ starring the repo.

Built for agents. Owned by you.