GitHub - ChunkyTortoise/ai-orchestrator: Multi-model AI orchestration layer with routing, fallback, and caching

5 min read Original article ↗

Sponsor PyPI Python Downloads

AgentForge

Switching LLM providers means rewriting API calls. AgentForge provides a unified async interface across Claude, Gemini, OpenAI, and Perplexity -- swap providers with one parameter change.

Streamlit App CI codecov Security License

🎥 Demo Video

Coming Soon — Video walkthrough of AgentForge's key features including multi-provider switching, cost optimization, and agent orchestration.

Watch Demo (Link will be added when video is ready)

Planned Video Content:

  • Quick Start (2 min) — From install to first API call in under 5 minutes
  • Provider Switching (2 min) — Swap Claude for GPT-4 with one parameter change
  • Cost Optimization (3 min) — 89% cost reduction with caching and routing
  • Multi-Agent Mesh (3 min) — Orchestrate multiple agents with handoffs

What This Solves

  • Provider lock-in -- One interface for 4 LLM providers. Switch from Claude to Gemini by changing a string
  • Framework bloat -- 2 core dependencies and ~15 KB vs. LangChain's 50+ deps and ~50 MB install
  • Production gaps -- Token-aware rate limiting, prompt templates, and retry with backoff built in

Architecture

graph TB
    subgraph Interface["Interface Layer"]
        CLI["CLI (Click)"]
        API["REST API (FastAPI)"]
        VIZ["Streamlit Visualizer"]
    end

    subgraph Core["Orchestration Core"]
        ORC["AIOrchestrator"]
        RL["Rate Limiter<br/>(Token Bucket)"]
        RT["Retry + Backoff<br/>(Exponential)"]
        PT["Prompt Templates<br/>({{var}} substitution)"]
        CT["Cost Tracker"]
    end

    subgraph Agents["Agent Framework"]
        REACT["ReAct Agent Loop<br/>(Thought → Action → Observe)"]
        MESH["Multi-Agent Mesh<br/>(Consensus + Handoff)"]
        MEM["Agent Memory<br/>(Sliding Window + Summary)"]
        GR["Guardrails Engine<br/>(Content + Token limits)"]
        DAG["Workflow DAG<br/>(Parallel execution)"]
    end

    subgraph Providers["Provider Adapters"]
        CL["Claude<br/>(Anthropic)"]
        GP["GPT-4<br/>(OpenAI)"]
        GE["Gemini<br/>(Google)"]
        PX["Perplexity<br/>(Sonar)"]
        MK["Mock<br/>(Testing)"]
    end

    subgraph Tools["Tool System"]
        TR["Tool Registry"]
        TE["Tool Execution Engine"]
        SO["Structured Output<br/>(JSON Schema)"]
    end

    subgraph Observability["Observability"]
        TC["Tracing (EventCollector)"]
        EV["Evaluation Framework"]
        MR["Model Registry"]
    end

    CLI --> ORC
    API --> ORC
    VIZ --> TC

    ORC --> RL
    ORC --> RT
    ORC --> PT
    ORC --> CT

    ORC --> CL & GP & GE & PX & MK

    REACT --> ORC
    REACT --> TR
    MESH --> REACT
    MEM --> REACT
    GR --> REACT
    DAG --> REACT

    TR --> TE
    TE --> SO

    ORC --> TC
    REACT --> TC
    TC --> EV
    MR --> ORC

    style Interface fill:#e1f5fe,stroke:#0288d1
    style Core fill:#f3e5f5,stroke:#7b1fa2
    style Agents fill:#e8f5e9,stroke:#388e3c
    style Providers fill:#fff3e0,stroke:#f57c00
    style Tools fill:#fce4ec,stroke:#c62828
    style Observability fill:#f5f5f5,stroke:#616161
Loading

Key Metrics

Metric Value
Test Suite 490+ automated tests
Provider Latency <50ms overhead per call
Tool Execution <10ms dispatch time
Agent Mesh 5+ concurrent agents
Tracing Coverage 100% of agent decisions
Rate Limiting Token-bucket per provider

Modules

Module File Description
Orchestrator client.py Unified async interface: chat, stream, fallback routing
CLI cli.py Click-based CLI: chat, stream, benchmark, health check
Providers providers/ Gemini, Claude, OpenAI, Perplexity, Mock (5 providers)
Retry retry.py Exponential backoff with jitter and configurable retryable exceptions
Rate Limiter rate_limiter.py Token-aware rate limiting with token bucket algorithm
Prompt Templates prompt_template.py Reusable templates with {{variable}} substitution and built-ins
Tools tools.py Function calling: define, register, format, and execute tool calls
Structured Output structured.py JSON extraction from LLM responses with schema validation
Cost Tracker cost_tracker.py Per-request cost recording, provider breakdown, session totals

Quick Start

git clone https://github.com/ChunkyTortoise/ai-orchestrator.git
cd ai-orchestrator
pip install -r requirements.txt
make test

# Try it immediately -- no API keys needed
agentforge "Explain RAG in 2 sentences" --provider mock

# With a real provider
cp .env.example .env  # Add your API keys
agentforge "Explain RAG in 2 sentences" --provider gemini

Usage

Python API

import asyncio
from agentforge import AIOrchestrator

async def main():
    orc = AIOrchestrator(temperature=0.3, max_tokens=2048)
    response = await orc.chat("gemini", "Explain RAG in 2 sentences")
    print(response.content)

asyncio.run(main())

CLI

agentforge "What is RAG?"                                      # Default provider
agentforge "Compare Python and Rust" --provider claude          # Specify provider
agentforge "Write a haiku" --provider openai --stream           # Stream output
agentforge "Summarize this" --provider claude --fallback openai # Fallback routing
agentforge health                                               # Check provider status
agentforge benchmark                                            # Compare providers

Supported Providers

Provider Default Model Streaming Key Features
Gemini gemini-1.5-pro Yes Long context (1M tokens), multimodal
Claude claude-3-5-sonnet-20241022 Yes Strong reasoning, 200K context
OpenAI gpt-4o Yes Function calling, JSON mode
Perplexity sonar-reasoning-pro Yes Built-in web search, citations

Tech Stack

Layer Technology
HTTP httpx (async)
CLI Click
Config python-dotenv
Testing pytest (490+ tests)
CI GitHub Actions (Python 3.11, 3.12)
Linting Ruff

Project Structure

ai-orchestrator/
├── agentforge/
│   ├── client.py               # AIOrchestrator (unified interface)
│   ├── cli.py                  # Click CLI
│   ├── providers/              # Gemini, Claude, OpenAI, Perplexity, Mock
│   ├── retry.py                # Exponential backoff + jitter
│   ├── rate_limiter.py         # Token bucket rate limiting
│   ├── prompt_template.py      # Prompt templates + variables
│   ├── tools.py                # Function calling abstraction
│   ├── structured.py           # JSON extraction + validation
│   └── cost_tracker.py         # Per-request cost tracking
├── tests/                      # 21 test files (490+ tests)
├── .github/workflows/ci.yml
├── Makefile
└── pyproject.toml

Architecture Decisions

ADR Title Status
ADR-0001 Provider Abstraction Pattern Accepted
ADR-0002 Token Bucket Rate Limiting Accepted
ADR-0003 ReAct Agent Loop Accepted

Testing

make test                                    # Full suite (490+ tests)
python -m pytest tests/ -v                   # Verbose output
python -m pytest tests/test_client.py        # Unit tests (mocked)
python -m pytest tests/test_integration.py -m integration  # Needs API keys

Benchmarks

See BENCHMARKS.md for detailed performance data.

python -m benchmarks.run_all

Changelog

See CHANGELOG.md for release history.

Related Projects

  • EnterpriseHub -- Real estate AI platform with BI dashboards and CRM integration
  • insight-engine -- Upload CSV/Excel, get instant dashboards, predictive models, and reports
  • docqa-engine -- RAG document Q&A with hybrid retrieval and prompt engineering lab
  • scrape-and-serve -- Web scraping, price monitoring, Excel-to-web apps, and SEO tools
  • prompt-engineering-lab -- 8 prompt patterns, A/B testing, TF-IDF evaluation
  • llm-integration-starter -- Production LLM patterns: completion, streaming, function calling, RAG, hardening
  • Portfolio -- Project showcase and services

Client Testimonials

See what clients say about working with me: TESTIMONIALS.md

"The 89% cost reduction claim is real. We went from $3,600/month in API costs to under $400."
CTO, AI Startup

Work With Me

Building production AI systems? I help teams ship reliable LLM/Agent architectures:

  • 💼 Consulting — Architecture reviews, provider selection, cost optimization
  • 🚀 Implementation — Multi-agent systems, RAG pipelines, production hardening
  • 📧 Enterprise — Custom licensing, SLAs, dedicated support

Book a Call Email Me

Available for freelance gigs, contract work, and advisory roles.

Support This Project

If AgentForge has been useful to you, consider sponsoring its continued development:

Sponsor

See SPONSORS.md for sponsorship tiers and benefits.

License

MIT