GitHub - ChunkyTortoise/ai-orchestrator: Multi-model AI orchestration layer with routing, fallback, and caching

AgentForge

Switching LLM providers means rewriting API calls. AgentForge provides a unified async interface across Claude, Gemini, OpenAI, and Perplexity -- swap providers with one parameter change.

🎥 Demo Video

Coming Soon — Video walkthrough of AgentForge's key features including multi-provider switching, cost optimization, and agent orchestration.

Watch Demo (Link will be added when video is ready)

Planned Video Content:

Quick Start (2 min) — From install to first API call in under 5 minutes
Provider Switching (2 min) — Swap Claude for GPT-4 with one parameter change
Cost Optimization (3 min) — 89% cost reduction with caching and routing
Multi-Agent Mesh (3 min) — Orchestrate multiple agents with handoffs

What This Solves

Provider lock-in -- One interface for 4 LLM providers. Switch from Claude to Gemini by changing a string
Framework bloat -- 2 core dependencies and ~15 KB vs. LangChain's 50+ deps and ~50 MB install
Production gaps -- Token-aware rate limiting, prompt templates, and retry with backoff built in

Architecture

graph TB
    subgraph Interface["Interface Layer"]
        CLI["CLI (Click)"]
        API["REST API (FastAPI)"]
        VIZ["Streamlit Visualizer"]
    end

    subgraph Core["Orchestration Core"]
        ORC["AIOrchestrator"]
        RL["Rate Limiter<br/>(Token Bucket)"]
        RT["Retry + Backoff<br/>(Exponential)"]
        PT["Prompt Templates<br/>({{var}} substitution)"]
        CT["Cost Tracker"]
    end

    subgraph Agents["Agent Framework"]
        REACT["ReAct Agent Loop<br/>(Thought → Action → Observe)"]
        MESH["Multi-Agent Mesh<br/>(Consensus + Handoff)"]
        MEM["Agent Memory<br/>(Sliding Window + Summary)"]
        GR["Guardrails Engine<br/>(Content + Token limits)"]
        DAG["Workflow DAG<br/>(Parallel execution)"]
    end

    subgraph Providers["Provider Adapters"]
        CL["Claude<br/>(Anthropic)"]
        GP["GPT-4<br/>(OpenAI)"]
        GE["Gemini<br/>(Google)"]
        PX["Perplexity<br/>(Sonar)"]
        MK["Mock<br/>(Testing)"]
    end

    subgraph Tools["Tool System"]
        TR["Tool Registry"]
        TE["Tool Execution Engine"]
        SO["Structured Output<br/>(JSON Schema)"]
    end

    subgraph Observability["Observability"]
        TC["Tracing (EventCollector)"]
        EV["Evaluation Framework"]
        MR["Model Registry"]
    end

    CLI --> ORC
    API --> ORC
    VIZ --> TC

    ORC --> RL
    ORC --> RT
    ORC --> PT
    ORC --> CT

    ORC --> CL & GP & GE & PX & MK

    REACT --> ORC
    REACT --> TR
    MESH --> REACT
    MEM --> REACT
    GR --> REACT
    DAG --> REACT

    TR --> TE
    TE --> SO

    ORC --> TC
    REACT --> TC
    TC --> EV
    MR --> ORC

    style Interface fill:#e1f5fe,stroke:#0288d1
    style Core fill:#f3e5f5,stroke:#7b1fa2
    style Agents fill:#e8f5e9,stroke:#388e3c
    style Providers fill:#fff3e0,stroke:#f57c00
    style Tools fill:#fce4ec,stroke:#c62828
    style Observability fill:#f5f5f5,stroke:#616161

Key Metrics

Metric	Value
Test Suite	490+ automated tests
Provider Latency	<50ms overhead per call
Tool Execution	<10ms dispatch time
Agent Mesh	5+ concurrent agents
Tracing Coverage	100% of agent decisions
Rate Limiting	Token-bucket per provider

Modules

Module	File	Description
Orchestrator	`client.py`	Unified async interface: chat, stream, fallback routing
CLI	`cli.py`	Click-based CLI: chat, stream, benchmark, health check
Providers	`providers/`	Gemini, Claude, OpenAI, Perplexity, Mock (5 providers)
Retry	`retry.py`	Exponential backoff with jitter and configurable retryable exceptions
Rate Limiter	`rate_limiter.py`	Token-aware rate limiting with token bucket algorithm
Prompt Templates	`prompt_template.py`	Reusable templates with `{{variable}}` substitution and built-ins
Tools	`tools.py`	Function calling: define, register, format, and execute tool calls
Structured Output	`structured.py`	JSON extraction from LLM responses with schema validation
Cost Tracker	`cost_tracker.py`	Per-request cost recording, provider breakdown, session totals

Quick Start

git clone https://github.com/ChunkyTortoise/ai-orchestrator.git
cd ai-orchestrator
pip install -r requirements.txt
make test

# Try it immediately -- no API keys needed
agentforge "Explain RAG in 2 sentences" --provider mock

# With a real provider
cp .env.example .env  # Add your API keys
agentforge "Explain RAG in 2 sentences" --provider gemini

Usage

Python API

import asyncio
from agentforge import AIOrchestrator

async def main():
    orc = AIOrchestrator(temperature=0.3, max_tokens=2048)
    response = await orc.chat("gemini", "Explain RAG in 2 sentences")
    print(response.content)

asyncio.run(main())

CLI

agentforge "What is RAG?"                                      # Default provider
agentforge "Compare Python and Rust" --provider claude          # Specify provider
agentforge "Write a haiku" --provider openai --stream           # Stream output
agentforge "Summarize this" --provider claude --fallback openai # Fallback routing
agentforge health                                               # Check provider status
agentforge benchmark                                            # Compare providers

Supported Providers

Provider	Default Model	Streaming	Key Features
Gemini	`gemini-1.5-pro`	Yes	Long context (1M tokens), multimodal
Claude	`claude-3-5-sonnet-20241022`	Yes	Strong reasoning, 200K context
OpenAI	`gpt-4o`	Yes	Function calling, JSON mode
Perplexity	`sonar-reasoning-pro`	Yes	Built-in web search, citations

Tech Stack

Layer	Technology
HTTP	httpx (async)
CLI	Click
Config	python-dotenv
Testing	pytest (490+ tests)
CI	GitHub Actions (Python 3.11, 3.12)
Linting	Ruff

Project Structure

ai-orchestrator/
├── agentforge/
│   ├── client.py               # AIOrchestrator (unified interface)
│   ├── cli.py                  # Click CLI
│   ├── providers/              # Gemini, Claude, OpenAI, Perplexity, Mock
│   ├── retry.py                # Exponential backoff + jitter
│   ├── rate_limiter.py         # Token bucket rate limiting
│   ├── prompt_template.py      # Prompt templates + variables
│   ├── tools.py                # Function calling abstraction
│   ├── structured.py           # JSON extraction + validation
│   └── cost_tracker.py         # Per-request cost tracking
├── tests/                      # 21 test files (490+ tests)
├── .github/workflows/ci.yml
├── Makefile
└── pyproject.toml

Architecture Decisions

ADR	Title	Status
ADR-0001	Provider Abstraction Pattern	Accepted
ADR-0002	Token Bucket Rate Limiting	Accepted
ADR-0003	ReAct Agent Loop	Accepted

Testing

make test                                    # Full suite (490+ tests)
python -m pytest tests/ -v                   # Verbose output
python -m pytest tests/test_client.py        # Unit tests (mocked)
python -m pytest tests/test_integration.py -m integration  # Needs API keys

Benchmarks

See BENCHMARKS.md for detailed performance data.

python -m benchmarks.run_all

Changelog

See CHANGELOG.md for release history.

Related Projects

EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
insight-engine -- Upload CSV/Excel, get instant dashboards, predictive models, and reports
docqa-engine -- RAG document Q&A with hybrid retrieval and prompt engineering lab
scrape-and-serve -- Web scraping, price monitoring, Excel-to-web apps, and SEO tools
prompt-engineering-lab -- 8 prompt patterns, A/B testing, TF-IDF evaluation
llm-integration-starter -- Production LLM patterns: completion, streaming, function calling, RAG, hardening
Portfolio -- Project showcase and services

Client Testimonials

See what clients say about working with me: TESTIMONIALS.md

"The 89% cost reduction claim is real. We went from $3,600/month in API costs to under $400."
— CTO, AI Startup

Work With Me

Building production AI systems? I help teams ship reliable LLM/Agent architectures:

💼 Consulting — Architecture reviews, provider selection, cost optimization
🚀 Implementation — Multi-agent systems, RAG pipelines, production hardening
📧 Enterprise — Custom licensing, SLAs, dedicated support

Available for freelance gigs, contract work, and advisory roles.

Support This Project

If AgentForge has been useful to you, consider sponsoring its continued development:

See SPONSORS.md for sponsorship tiers and benefits.

License

MIT