AgentForge
Switching LLM providers means rewriting API calls. AgentForge provides a unified async interface across Claude, Gemini, OpenAI, and Perplexity -- swap providers with one parameter change.
🎥 Demo Video
Coming Soon — Video walkthrough of AgentForge's key features including multi-provider switching, cost optimization, and agent orchestration.
Watch Demo (Link will be added when video is ready)
Planned Video Content:
- Quick Start (2 min) — From install to first API call in under 5 minutes
- Provider Switching (2 min) — Swap Claude for GPT-4 with one parameter change
- Cost Optimization (3 min) — 89% cost reduction with caching and routing
- Multi-Agent Mesh (3 min) — Orchestrate multiple agents with handoffs
What This Solves
- Provider lock-in -- One interface for 4 LLM providers. Switch from Claude to Gemini by changing a string
- Framework bloat -- 2 core dependencies and ~15 KB vs. LangChain's 50+ deps and ~50 MB install
- Production gaps -- Token-aware rate limiting, prompt templates, and retry with backoff built in
Architecture
graph TB
subgraph Interface["Interface Layer"]
CLI["CLI (Click)"]
API["REST API (FastAPI)"]
VIZ["Streamlit Visualizer"]
end
subgraph Core["Orchestration Core"]
ORC["AIOrchestrator"]
RL["Rate Limiter<br/>(Token Bucket)"]
RT["Retry + Backoff<br/>(Exponential)"]
PT["Prompt Templates<br/>({{var}} substitution)"]
CT["Cost Tracker"]
end
subgraph Agents["Agent Framework"]
REACT["ReAct Agent Loop<br/>(Thought → Action → Observe)"]
MESH["Multi-Agent Mesh<br/>(Consensus + Handoff)"]
MEM["Agent Memory<br/>(Sliding Window + Summary)"]
GR["Guardrails Engine<br/>(Content + Token limits)"]
DAG["Workflow DAG<br/>(Parallel execution)"]
end
subgraph Providers["Provider Adapters"]
CL["Claude<br/>(Anthropic)"]
GP["GPT-4<br/>(OpenAI)"]
GE["Gemini<br/>(Google)"]
PX["Perplexity<br/>(Sonar)"]
MK["Mock<br/>(Testing)"]
end
subgraph Tools["Tool System"]
TR["Tool Registry"]
TE["Tool Execution Engine"]
SO["Structured Output<br/>(JSON Schema)"]
end
subgraph Observability["Observability"]
TC["Tracing (EventCollector)"]
EV["Evaluation Framework"]
MR["Model Registry"]
end
CLI --> ORC
API --> ORC
VIZ --> TC
ORC --> RL
ORC --> RT
ORC --> PT
ORC --> CT
ORC --> CL & GP & GE & PX & MK
REACT --> ORC
REACT --> TR
MESH --> REACT
MEM --> REACT
GR --> REACT
DAG --> REACT
TR --> TE
TE --> SO
ORC --> TC
REACT --> TC
TC --> EV
MR --> ORC
style Interface fill:#e1f5fe,stroke:#0288d1
style Core fill:#f3e5f5,stroke:#7b1fa2
style Agents fill:#e8f5e9,stroke:#388e3c
style Providers fill:#fff3e0,stroke:#f57c00
style Tools fill:#fce4ec,stroke:#c62828
style Observability fill:#f5f5f5,stroke:#616161
Key Metrics
| Metric | Value |
|---|---|
| Test Suite | 490+ automated tests |
| Provider Latency | <50ms overhead per call |
| Tool Execution | <10ms dispatch time |
| Agent Mesh | 5+ concurrent agents |
| Tracing Coverage | 100% of agent decisions |
| Rate Limiting | Token-bucket per provider |
Modules
| Module | File | Description |
|---|---|---|
| Orchestrator | client.py |
Unified async interface: chat, stream, fallback routing |
| CLI | cli.py |
Click-based CLI: chat, stream, benchmark, health check |
| Providers | providers/ |
Gemini, Claude, OpenAI, Perplexity, Mock (5 providers) |
| Retry | retry.py |
Exponential backoff with jitter and configurable retryable exceptions |
| Rate Limiter | rate_limiter.py |
Token-aware rate limiting with token bucket algorithm |
| Prompt Templates | prompt_template.py |
Reusable templates with {{variable}} substitution and built-ins |
| Tools | tools.py |
Function calling: define, register, format, and execute tool calls |
| Structured Output | structured.py |
JSON extraction from LLM responses with schema validation |
| Cost Tracker | cost_tracker.py |
Per-request cost recording, provider breakdown, session totals |
Quick Start
git clone https://github.com/ChunkyTortoise/ai-orchestrator.git cd ai-orchestrator pip install -r requirements.txt make test # Try it immediately -- no API keys needed agentforge "Explain RAG in 2 sentences" --provider mock # With a real provider cp .env.example .env # Add your API keys agentforge "Explain RAG in 2 sentences" --provider gemini
Usage
Python API
import asyncio from agentforge import AIOrchestrator async def main(): orc = AIOrchestrator(temperature=0.3, max_tokens=2048) response = await orc.chat("gemini", "Explain RAG in 2 sentences") print(response.content) asyncio.run(main())
CLI
agentforge "What is RAG?" # Default provider agentforge "Compare Python and Rust" --provider claude # Specify provider agentforge "Write a haiku" --provider openai --stream # Stream output agentforge "Summarize this" --provider claude --fallback openai # Fallback routing agentforge health # Check provider status agentforge benchmark # Compare providers
Supported Providers
| Provider | Default Model | Streaming | Key Features |
|---|---|---|---|
| Gemini | gemini-1.5-pro |
Yes | Long context (1M tokens), multimodal |
| Claude | claude-3-5-sonnet-20241022 |
Yes | Strong reasoning, 200K context |
| OpenAI | gpt-4o |
Yes | Function calling, JSON mode |
| Perplexity | sonar-reasoning-pro |
Yes | Built-in web search, citations |
Tech Stack
| Layer | Technology |
|---|---|
| HTTP | httpx (async) |
| CLI | Click |
| Config | python-dotenv |
| Testing | pytest (490+ tests) |
| CI | GitHub Actions (Python 3.11, 3.12) |
| Linting | Ruff |
Project Structure
ai-orchestrator/
├── agentforge/
│ ├── client.py # AIOrchestrator (unified interface)
│ ├── cli.py # Click CLI
│ ├── providers/ # Gemini, Claude, OpenAI, Perplexity, Mock
│ ├── retry.py # Exponential backoff + jitter
│ ├── rate_limiter.py # Token bucket rate limiting
│ ├── prompt_template.py # Prompt templates + variables
│ ├── tools.py # Function calling abstraction
│ ├── structured.py # JSON extraction + validation
│ └── cost_tracker.py # Per-request cost tracking
├── tests/ # 21 test files (490+ tests)
├── .github/workflows/ci.yml
├── Makefile
└── pyproject.toml
Architecture Decisions
| ADR | Title | Status |
|---|---|---|
| ADR-0001 | Provider Abstraction Pattern | Accepted |
| ADR-0002 | Token Bucket Rate Limiting | Accepted |
| ADR-0003 | ReAct Agent Loop | Accepted |
Testing
make test # Full suite (490+ tests) python -m pytest tests/ -v # Verbose output python -m pytest tests/test_client.py # Unit tests (mocked) python -m pytest tests/test_integration.py -m integration # Needs API keys
Benchmarks
See BENCHMARKS.md for detailed performance data.
python -m benchmarks.run_all
Changelog
See CHANGELOG.md for release history.
Related Projects
- EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
- insight-engine -- Upload CSV/Excel, get instant dashboards, predictive models, and reports
- docqa-engine -- RAG document Q&A with hybrid retrieval and prompt engineering lab
- scrape-and-serve -- Web scraping, price monitoring, Excel-to-web apps, and SEO tools
- prompt-engineering-lab -- 8 prompt patterns, A/B testing, TF-IDF evaluation
- llm-integration-starter -- Production LLM patterns: completion, streaming, function calling, RAG, hardening
- Portfolio -- Project showcase and services
Client Testimonials
See what clients say about working with me: TESTIMONIALS.md
"The 89% cost reduction claim is real. We went from $3,600/month in API costs to under $400."
— CTO, AI Startup
Work With Me
Building production AI systems? I help teams ship reliable LLM/Agent architectures:
- 💼 Consulting — Architecture reviews, provider selection, cost optimization
- 🚀 Implementation — Multi-agent systems, RAG pipelines, production hardening
- 📧 Enterprise — Custom licensing, SLAs, dedicated support
Available for freelance gigs, contract work, and advisory roles.
Support This Project
If AgentForge has been useful to you, consider sponsoring its continued development:
See SPONSORS.md for sponsorship tiers and benefits.
License
MIT