β‘ Quick Start
Three commands. One endpoint. No API keys, no quotas, no vendor lock-in.
git clone https://github.com/brcrusoe72/agent-search.git
cd agent-search
docker compose up -dcurl "http://localhost:3939/search?q=distributed+consensus+algorithms&count=5"That's it. You now have a deduplicated, multi-engine, LLM-ready search API running at http://localhost:3939.
Terminal Demo
For a reproducible terminal GIF workflow using the real AgentSearch quick-start commands, see docs/TERMINAL_GIF_GUIDE.md and the tapes in docs/demo/.
π― Why AgentSearch?
You could call SearXNG directly. Most people building serious agent infrastructure end up writing this layer anyway. AgentSearch is that layer, already built.
π How It Compares
| AgentSearch | Tavily | Exa | SerpAPI | Google CSE | |
|---|---|---|---|---|---|
| Cost | Your infra only | $0.005/query | $0.003/query | $50/mo | $5/1K queries |
| API key required | Optional | β | β | β | β |
| Setup | docker compose up |
Sign up | Sign up | Sign up | Console + billing |
| Engines | 70+ via SearXNG | Tavily only | Exa only | Google only | Google only |
| Self-hosted | β | β | β | β | β |
| Content extraction | 9-strategy kill chain | Basic | Built-in | β | β |
| Query expansion | β | Partial | β | β | β |
| MCP server | β Included | Third-party | Third-party | β | β |
| Cross-engine scoring | β | N/A | N/A | β | β |
| Data ownership | 100% yours | Vendor | Vendor | Vendor | Vendor |
Translation: if you're running more than ~10K queries/month against Tavily or Exa, AgentSearch pays for itself the first month. If you're processing sensitive queries, it's the only option that doesn't leak them to a third party.
π§© Core Features
Deduplication & Cross-Engine Scoring
Every query hits multiple engines. Results are fingerprinted, deduplicated, and scored by cross-engine agreement. A result that surfaces on Google and Bing and DuckDuckGo ranks higher than one that only appears on one.
curl "http://localhost:3939/search?q=python+async+patterns&count=5"{
"results": [
{
"title": "Async IO in Python: A Complete Walkthrough",
"url": "https://realpython.com/async-io-python/",
"snippet": "A comprehensive guide to async/await in Python 3...",
"engines": ["google", "bing", "duckduckgo"],
"score": 1.0,
"position": 1
}
],
"meta": {
"query": "python async patterns",
"total": 5,
"engines_used": ["google", "bing", "duckduckgo"],
"cached": false,
"response_time_ms": 842.3
}
}The 9-Strategy Kill Chain
Content extraction is the silent killer of most RAG pipelines. /read doesn't give up on the first failure β it cascades through nine strategies, each tuned for a different class of stubborn URL.
flowchart TD
Start([URL Request]) --> S1{1. Direct Fetch}
S1 -->|β| Done([Return Content])
S1 -->|β| S2{2. Readability Parse}
S2 -->|β| Done
S2 -->|β| S3{3. UA Rotation}
S3 -->|β| Done
S3 -->|β| S4{4. JS-Rendered Fallback}
S4 -->|β| Done
S4 -->|β| S5{5. Wayback Machine}
S5 -->|β| Done
S5 -->|β| S6{6. Google Cache}
S6 -->|β| Done
S6 -->|β| S7[7β9. Additional Fallbacks]
S7 -->|β| Done
S7 -->|β| Report[Report to /adapt/report]
Report --> Loop[Self-improvement loop<br/>re-orders the chain]
style Done fill:#10b981,stroke:#065f46,color:#fff
style Report fill:#f59e0b,stroke:#92400e,color:#fff
style Loop fill:#8b5cf6,stroke:#5b21b6,color:#fff
Most URLs resolve on strategies 1β3. The chain exists for the rest.
Deep Search with Query Expansion
Ambiguous or underspecified queries are the norm in agent workflows. Deep search generates 3β5 variations, runs them all, deduplicates across result sets, and returns a fused ranking.
curl "http://localhost:3939/search/deep?q=best+practices+for+llm+caching&count=10"Expands to: "LLM response caching strategies", "semantic cache for language models", "prompt caching best practices", and similar β then merges the top results.
MCP: Claude Desktop, Cursor, Windsurf
Six tools exposed over stdio: search, deep_search, read_url, read_batch, news, search_jobs. Plug it into any MCP-compatible client and your agent can reach the open web without custom tool code.
{
"mcpServers": {
"agent-search": {
"command": "python",
"args": ["/path/to/agent-search/mcp-server/server.py"]
}
}
}Production Essentials, Built In
| π¦ Rate limiting | Per-IP and global, configurable via env vars |
| π Bearer token auth | Optional, applies to everything except /health |
| πΎ In-memory caching | Default 1-hour TTL, configurable |
| π₯ Health checks | Container status + upstream SearXNG connectivity |
| π Self-improvement loop | Tracks extraction failures, re-orders kill chain deterministically |
π‘ API Reference
| Endpoint | Method | What It Does |
|---|---|---|
/search |
GET |
Web search with deduplication and multi-engine scoring |
/search/deep |
GET |
Multi-query fusion β generates variations, merges results |
/search/extract |
GET |
Search + extract page content from top results in one call |
/search/jobs |
GET |
Job board search (via SearXNG job engines) |
/search/stats |
GET |
Query statistics and usage metrics |
/read |
GET |
Extract readable content from a single URL (9-strategy kill chain) |
/read/batch |
POST |
Batch extract up to 20 URLs concurrently |
/news |
GET |
Structured news from Google News, Bing News, and friends |
/adapt/report |
POST |
Report extraction failures (feeds the self-improvement loop) |
/adapt/stats |
GET |
View adaptation metrics |
/adapt/evolve |
POST |
Trigger self-improvement analysis |
/health |
GET |
Health check |
/engines |
GET |
List available search engines and their status |
More example calls
Search + extract in one round-trip
curl "http://localhost:3939/search?q=rust+error+handling&count=3&fetch=true"Read a single URL
curl "http://localhost:3939/read?url=https://example.com/some-article"Structured news
curl "http://localhost:3939/news?q=ai+regulation&count=5"Job search
curl "http://localhost:3939/search/jobs?q=senior+python+engineer&location=remote"Batch extraction
curl -X POST "http://localhost:3939/read/batch" \ -H "Content-Type: application/json" \ -d '{"urls": ["https://example.com/a", "https://example.com/b"]}'
π Clients & Integrations
Python Client
pip install agentsearch-client
from agentsearch import AgentSearch client = AgentSearch() # defaults to localhost:3939 results = client.search("distributed systems consensus algorithms") for r in results: print(f"{r.title} β {r.url}")
LangChain
from langchain.tools import tool import requests @tool def web_search(query: str) -> str: """Search the web using AgentSearch.""" resp = requests.get( "http://localhost:3939/search", params={"q": query, "count": 5} ) results = resp.json()["results"] return "\n".join( f"- {r['title']}: {r['url']}\n {r['snippet']}" for r in results )
MCP Server
pip install mcp httpx python mcp-server/server.py
See mcp-server/README.md for remote setup, custom ports, and troubleshooting.
ποΈ Architecture
flowchart LR
subgraph Clients["π§βπ» Clients"]
Agent["Your Agent<br/>any LLM"]
MCP["MCP Clients<br/>Claude Β· Cursor Β· Windsurf"]
end
subgraph Core["βοΈ AgentSearch β :3939"]
API["FastAPI<br/>Dedup Β· Scoring Β· Cache<br/>Rate limits Β· Auth Β· Kill chain"]
MCPServer["MCP Server<br/>(stdio)"]
end
subgraph Upstream["π Search Layer"]
SXNG["SearXNG<br/>:8080"]
Engines["Google Β· Bing Β· DuckDuckGo<br/>Brave Β· Startpage Β· Wikipedia<br/>70+ engines"]
end
Agent <-->|HTTP/JSON| API
MCP <-->|stdio| MCPServer
MCPServer <--> API
API <-->|HTTP| SXNG
SXNG <--> Engines
style API fill:#3b82f6,stroke:#1e40af,color:#fff
style SXNG fill:#8b5cf6,stroke:#5b21b6,color:#fff
style MCPServer fill:#10b981,stroke:#065f46,color:#fff
The heavy lifting β deduplication, cross-engine scoring, kill-chain extraction, query expansion, caching, auth, self-improvement β happens in the middle layer. SearXNG handles engine rotation and upstream rate limiting. Your agent just gets clean JSON.
π§ Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
SEARXNG_URL |
http://searxng:8080 |
SearXNG instance URL |
CACHE_TTL |
3600 |
Cache duration (seconds) |
RATE_LIMIT |
60 |
Max requests per minute per IP |
GLOBAL_RATE_LIMIT |
300 |
Max requests per minute across all IPs |
AGENT_SEARCH_TOKEN |
(empty) | Set to require Bearer <token> auth |
Search Engines
Edit searxng/settings.yml to enable/disable engines, then restart:
docker compose restart searxng
Running Without Docker
pip install -r requirements.txt SEARXNG_URL=http://localhost:8080 uvicorn app.main:app --reload --port 3939
Requires a SearXNG instance running separately.
π Running in Production
Things that are easy to forget until they bite:
- Set
AGENT_SEARCH_TOKEN. The defaultdocker-composebinds to127.0.0.1:3939, but the moment you put a reverse proxy in front, you need auth. - Tune rate limits for your traffic shape.
RATE_LIMIT=60per IP is conservative. BumpGLOBAL_RATE_LIMITfirst β it protects upstream engines. - Enable more engines. More engines = better cross-engine scoring and better rate-limit headroom. SearXNG rotates automatically.
- Watch
/adapt/stats. If a site consistently fails the kill chain, the self-improvement loop will re-rank strategies. Let it cook. - Cache aggressively. Default TTL is 1 hour. For research-style workloads, 6β24 hours is reasonable. For news, drop it to 5 minutes.
β FAQ
How is this different from Perplexica?
Perplexica is an AI-powered search interface β it interprets your question and generates an answer. AgentSearch is an API backend β it returns structured results, extracted content, and metadata for your agent to reason over. Different layers of the stack.
Does the job search actually scrape LinkedIn/Indeed?
It searches through SearXNG engines that index job boards. It doesn't log into those sites or bypass their APIs. Result quality depends on which engines you enable and how those sites expose their listings to search engines. Set expectations accordingly.
What about rate limiting from upstream engines?
SearXNG rotates across engines and handles rate limiting internally. AgentSearch adds its own caching layer (default 1-hour TTL) so repeated queries don't hit upstream at all. In practice, moderate usage (a few hundred queries/day) runs fine. For heavy automation, enable more engines to spread the load.
Is the self-improvement loop (/adapt/evolve) using an LLM?
No. It's deterministic β it tracks which URLs fail extraction, which strategies succeed, and adjusts the kill chain ordering based on observed patterns. No API calls, no model inference, no costs.
Can I expose this to the internet?
You can, but set AGENT_SEARCH_TOKEN first. The default docker-compose binds to 127.0.0.1:3939 (localhost only). If you put it behind a reverse proxy, use the token auth and keep rate limits tight.
What's the kill chain?
A sequence of 9 content extraction strategies tried in order: direct HTTP fetch, readability parsing, user-agent rotation, JavaScript-rendered fallback, Wayback Machine, Google Cache, and several others. If strategy 1 fails, it tries strategy 2, and so on. Most URLs resolve on strategies 1β3. The chain exists for the stubborn ones.
Why not just use Tavily or Exa?
Go for it, if the pricing works for your volume and you're comfortable sending every query to a third party. AgentSearch exists for the cases where those constraints matter: cost at scale, data sensitivity, custom engine mixes, or simply wanting to own your infra.
πΊοΈ Roadmap
- Semantic re-ranking layer (optional, BYO embedding model)
- Redis-backed cache for multi-instance deployments
- Additional kill chain strategies (headless browser pool, Archive.today)
- Prometheus metrics endpoint
- Async Python client
- Postgres-backed adaptation store (currently in-memory)
Have ideas? Open an issue or drop a PR.
π€ Contributing
# 1. Fork it # 2. Create your branch git checkout -b feature/better-dedup # 3. Commit git commit -am 'Improve dedup algorithm' # 4. Push git push origin feature/better-dedup # 5. Open a PR
Bug reports, feature requests, and documentation improvements are all welcome. For larger changes, open an issue first so we can discuss scope.
π Acknowledgments
Built on the shoulders of SearXNG β a privacy-respecting metasearch engine that does the hard work of engine rotation and result federation. AgentSearch wouldn't exist without it.
Also inspired by the broader ecosystem of agent infrastructure tooling: LangChain, Model Context Protocol, Ollama, and the many others proving that self-hosted is not only viable, but often better.
π License
MIT β do whatever you want with it. See LICENSE for details.
If AgentSearch saves you an afternoon, consider β starring the repo.
Built for agents. Owned by you.