Show HN: LLM-use – cost-effective LLM orchestrator for agents
Hi HN, Built llm-use: a lightweight Python toolkit for efficient agent workflows with multiple LLMs. Core pattern: strong model (Claude/GPT-4o/big local) for planning + synthesis; cheap/local workers for parallel subtasks (research, scrape, summarize, extract…). Features: • Mix Anthropic, OpenAI, Ollama, llama.cpp • Smart router: cheap/local first, escalate only if needed (learned + heuristic) • Parallel workers (–max-workers) • Real scraping + cache (BS4 or Playwright) • Offline-first (full Ollama support) • Cost tracking ($ for cloud, 0 local) • TUI chat + MCP server mode • Local session logs Quick example (hybrid):
python3 cli.py exec \ --orchestrator anthropic:claude-3-7-sonnet-20250219 \ --worker ollama:llama3.1:8b \ --enable-scrape \ --task "Summarize 6 recent sources on post-quantum crypto"
Or routed version:
python3 cli.py exec \ --router ollama:llama3.1:8b \ --orchestrator openai:o1 \ --worker gpt-4o-mini \ --task "Explain recent macOS security updates"
MIT licensed, minimal deps, embeddable.
Repo: https://github.com/llm-use/llm-use
Feedback welcome on:
• Routing heuristics you’d find useful
• Pain points with agent costs / local vs cloud
• Missing integrations?
Thanks! Cool project. If anyone wants something lighter that works as a drop-in OpenAI proxy (no code changes needed), I built NadirClaw for exactly this. It classifies prompts in ~10ms and routes to cheap/local models automatically. Works with Claude Code, Cursor, aider out of the box. https://github.com/doramirdor/NadirClaw (author)