GitHub - sreenathmmenon/llmswap: Universal AI CLI & Python SDK for 8+ providers (OpenAI, Claude, Gemini, Cohere, Perplexity, IBM watsonx, Groq, Together AI). Multi-provider chat, code generation, cost optimization, age-appropriate AI. Claude Code/Gemini CLI alternative with zero vendor lock-in.

LLMSwap: Universal LLM SDK + MCP Client

Ship AI Apps Faster

Natural Language MCP + 11 LLM Providers. Latest Models Day-One.

GPT-5.2 (Dec '25) • Claude Opus 4.5 • Gemini 3 Flash • Gemini 3 Pro • Grok 4.1 (#1 LMArena) • DeepSeek V3.2 + 10 providers. Universal tool calling • MCP protocol • Zero vendor lock-in • Production-ready SDK + CLI.

One simple interface for Anthropic, OpenAI, Gemini, Groq, X.AI and more. Stop wrestling with complex frameworks—build production AI in 10 lines of code.

📚 Documentation: llmswap.org | ⚡ CLI Reference: CLI Docs | 🐍 SDK Guide: SDK Docs | 🔧 MCP Guide: #mcp-integration

🆕 NEW in v5.2.0: Universal Tool Calling

Enable LLMs to access YOUR data and systems - Define tools once, works across ALL providers.

from llmswap import LLMClient, Tool

# Define tool to access YOUR weather API
weather = Tool(
    name="get_weather",
    description="Get real-time weather data",
    parameters={"city": {"type": "string"}},
    required=["city"]
)

# Works with ANY provider - Anthropic, OpenAI, Gemini, Groq, xAI
client = LLMClient(provider="anthropic")
response = client.chat("What's the weather in Tokyo?", tools=[weather])

# LLM calls YOUR function → you return data → LLM gives natural response

Real-World Use Cases:

🌦️ Give LLM access to YOUR weather API for real-time data
💾 Let LLM query YOUR database for customer information
🛒 Enable LLM to search YOUR product catalog for shopping assistance
🔧 Connect LLM to YOUR systems and APIs

Works with: Anthropic, OpenAI, Groq, Gemini, xAI | Quick Start Guide → | Full Docs →

⚡ Quick Start (30 seconds)

# Recommended: Install with uv (fastest)
uv tool install llmswap

# Or install with pip
pip install llmswap

# or Homebrew
brew tap llmswap/tap && brew install llmswap

# Create your first workspace
cd ~/my-project
llmswap workspace init

# Chat with AI that remembers everything
llmswap chat "Help me with Flask routing"
# AI has full project context + all past learnings!

# 🆕 NEW: Connect to MCP servers with natural language
llmswap-mcp --command npx -y @modelcontextprotocol/server-filesystem ~/Documents
# Ask: "List all PDF files"
# Ask: "Read the contents of README.md"
# AI uses filesystem tools automatically!

# 🆕 Compare models visually (optional)
pip install llmswap[web]
llmswap web  # Opens browser - compare GPT-4 vs Claude vs Gemini

🆕 Latest Models Supported (December 2025)

New models work the day they launch - LLMSwap's pass-through architecture means no SDK updates needed.

⚡ Claude Opus 4.5 (Released Nov 24, 2025)

from llmswap import LLMClient

client = LLMClient(provider="anthropic", model="claude-opus-4-5")
response = client.chat("Build a full-stack application with authentication...")
print(response.content)

Latest flagship from Anthropic. State-of-the-art for coding & software engineering. Pricing: $5/$25 per million tokens. Best for: Complex coding, deep research, software engineering, spreadsheet management

🚀 Gemini 3 Pro (Released Nov 18, 2025)

from llmswap import LLMClient

client = LLMClient(provider="gemini", model="gemini-3-pro")
response = client.chat("Analyze this video and extract key insights...")
print(response.content)

Google's most advanced multimodal model. Processes text, images, videos, audio, PDFs. 1M+ input tokens. Best for: Multimodal understanding, large document analysis, batch processing

🧠 GPT-5.2 (Released Dec 11, 2025)

from llmswap import LLMClient

client = LLMClient(provider="openai", model="gpt-5.2")
response = client.chat("Design an algorithm for real-time fraud detection...")
print(response.content)

OpenAI's latest flagship. Most capable model for professional knowledge work. Variants: Instant (speed) & Thinking (reasoning). Also: GPT-5.2-Codex for agentic coding. Best for: Professional tasks, complex reasoning, coding, science & math

⚡ Gemini 3 Flash (Released Dec 17, 2025)

from llmswap import LLMClient

client = LLMClient(provider="gemini", model="gemini-3-flash")
response = client.chat("Analyze this codebase and suggest improvements...")
print(response.content)

Google's fastest frontier model. Pro-level reasoning at 10x lower cost. 1M input tokens, 64k output. Multimodal: text, images, video, audio, PDF. Best for: High-speed inference, cost optimization, everyday tasks, agentic workflows

🏆 Grok 4.1 (Released Nov 17, 2025)

from llmswap import LLMClient

client = LLMClient(provider="xai", model="grok-4.1")
response = client.chat("Help me understand this nuanced ethical dilemma...")
print(response.content)

#1 on LMArena Text Leaderboard. Enhanced emotional intelligence & creative collaboration. Preferred 64.78% in blind tests. Best for: Emotional intelligence, creative writing, collaborative tasks, nuanced understanding

💎 DeepSeek V3.2 (Released Dec 16, 2025)

from llmswap import LLMClient

client = LLMClient(provider="deepseek", model="deepseek-v3.2")
response = client.chat("Solve this complex mathematical problem...")
print(response.content)

Open-source powerhouse. Matches GPT-5 & Gemini 3 at 10x lower cost ($0.028/1M tokens). 671B parameters, 96% on AIME 2025. MIT License. Best for: Cost-sensitive applications, open-source projects, math & reasoning, on-premise deployment

Plus 6 more providers: Groq (5x faster LPU), Cohere (enterprise), Perplexity (search), IBM Watsonx (Granite 4.0), Ollama, Sarvam AI, local models.

Why it matters: New models work day-one. Pass-through architecture means future models work immediately upon release.

🆕 Use Any Model from Any Provider! New model just launched? Use it immediately. LLMSwap's pass-through architecture means GPT-5, Claude Opus 4, Gemini 2.5 Pro work the day they release. Currently supports 11 providers (OpenAI, Anthropic, Gemini, Cohere, Perplexity, IBM watsonx, Groq, Ollama, xAI Grok, Sarvam AI).

✅ Battle-Tested with LMArena Top Models: All 10 providers tested and validated with top-rated models from LMArena leaderboard. From Grok-4 (xAI's flagship) to Claude Sonnet 4.5 (best coding model) to Gemini 2.0 Flash Exp - every model in our defaults is production-validated and arena-tested for real-world use.

The First AI Tool with Project Memory & Learning Journals - LLMSwap v5.1.0 introduces revolutionary workspace system that remembers your learning journey across projects. Build apps without vendor lock-in (SDK) or use from terminal (CLI). Works with your existing subscriptions: Claude, OpenAI, Gemini, Cohere, Perplexity, IBM watsonx, Groq, Ollama, xAI Grok, Sarvam AI (10 providers). Use any model from your provider - even ones released tomorrow. Pass-through architecture means GPT-5, Gemini 2.5 Pro, Claude Opus 4? They work the day they launch.

🎯 Solve These Common Problems:

❌ "I need multiple second brains for different aspects of my life" 🆕
❌ "AI strays over time, I need to re-steer it constantly" 🆕
❌ "I keep explaining the same context to AI over and over"
❌ "AI forgets what I learned yesterday"
❌ "I lose track of architecture decisions across projects"
❌ "Context switching between projects is exhausting"
❌ "I want AI to understand my specific codebase, not generic answers"

✅ llmswap v5.1.0 Solves All These:

✅ Multiple independent "second brains" per project/life aspect 🆕
✅ Persistent context prevents AI from straying 🆕
✅ Per-project workspaces that persist context across sessions
✅ Auto-tracked learning journals - never forget what you learned
✅ Architecture decision logs - all your technical decisions documented
✅ Zero context switching - AI loads the right project automatically
✅ Project-aware AI - mentor understands YOUR specific tech stack

Why Developers Choose llmswap

✅ 10 Lines to Production - Not 1000 like LangChain ✅ MCP Protocol Support - Connect to any MCP server with natural language 🆕 ✅ Automatic Fallback - Never down. Switches providers if one fails ✅ 50-90% Cost Savings - Built-in caching. Same query = FREE ✅ Workspace Memory - Your AI remembers your project context ✅ Universal Tool Calling - Define once, works everywhere (NEW v5.2.0) ✅ CLI + SDK - Code AND terminal. Your choice ✅ Zero Lock-in - Switch from OpenAI to Claude in 1 line

Built for Speed:

🚀 Hackathons - Ship in hours
💡 MVPs - Validate ideas fast
📱 Production Apps - Scale as you grow
🎯 Real Projects - Trusted by developers worldwide

v5.1.0: Revolutionary AI mentorship with project memory, workspace-aware context, auto-tracked learning journals, and persistent mentor relationships. The first AI tool that truly remembers your learning journey across projects.

NEW in v5.2.0:

🛠️ Universal Tool Calling - Enable LLMs to use YOUR custom functions across all providers
🔧 5 Providers Supported - Anthropic, OpenAI, Groq, Gemini, xAI with automatic format conversion
📖 Complete Documentation - Full guides, examples, and real-world use cases
✅ 100% Backward Compatible - All existing features work without changes

v5.1.6:

🌐 Web UI - Compare 20+ models side-by-side in beautiful browser interface & learn prompting techniques
📊 Visual Comparison - Live streaming results with speed badges (⚡🥈🥉), cost charts, efficiency metrics
💰 Cost Optimizer - See exact costs across providers, find cheapest model for your use case
🎨 Markdown + Code Highlighting - Syntax-highlighted code blocks with individual copy buttons
💾 Smart Preferences - Remembers your favorite models via localStorage
📈 Real-time Metrics - Tokens/sec efficiency, response length indicators, actual API token counts

NEW in v5.1.0:

🧠 Workspace Memory - Per-project context that persists across sessions
📚 Auto-Learning Journal - Automatically tracks what you learn in each project
🎯 Context-Aware Mentorship - AI mentor understands your project and past learnings
📖 Architecture Decision Log - Document and remember key technical decisions
🔄 Cross-Project Intelligence - Learn patterns from one project, apply to another
💡 Proactive Learning - AI suggests next topics based on your progress
🗂️ Project Knowledge Base - Custom prompt library per workspace

🧠 Finally: An Elegant Solution for Multiple Second Brains

The Problem Industry Leaders Can't Solve:

"I still haven't found an elegant solution to the fact that I need several second brains for the various aspects of my life, each with different styles and contexts." - Industry feedback

The LLMSwap Solution: Workspace System

Each aspect of your life gets its own "brain" with independent memory:

💼 Work Projects - ~/work/api-platform - Enterprise patterns, team conventions
📚 Learning - ~/learning/rust - Your learning journey, struggles, progress
🚀 Side Projects - ~/personal/automation - Personal preferences, experiments
🌐 Open Source - ~/oss/django - Community patterns, contribution history

What Makes It "Elegant":

✅ Zero configuration - just cd to project directory
✅ Auto-switching - AI loads the right "brain" automatically
✅ No context bleed - work knowledge stays separate from personal
✅ Persistent memory - each brain remembers across sessions
✅ Independent personas - different teaching style per project if you want

Stop Re-Explaining Context. Start Building.

🎯 Transform AI Into Your Personal Mentor with Project Memory

Inspired by Eklavya - the legendary self-taught archer who learned from dedication and the right guidance - LLMSwap transforms any AI provider into a personalized mentor that adapts to your learning style and remembers your journey.

The Challenge: Developers struggle to learn effectively from AI because:

🔴 Responses are generic, lack personality, and don't adapt to individual needs
🔴 AI loses context between sessions - you repeat the same explanations
🔴 No learning history - AI doesn't know what you already learned
🔴 Project context is lost - AI doesn't understand your codebase

LLMSwap's Solution v5.1.0: Choose your mentorship style, initialize a workspace, and ANY AI provider becomes your personalized guide that remembers everything:

# 🆕 v5.1.0: Initialize workspace for your project
cd ~/my-flask-app
llmswap workspace init
# Creates .llmswap/ with context.md, learnings.md, decisions.md

# Now your AI mentor KNOWS your project
llmswap chat --mentor guru --alias "Guruji"
# Mentor has full context: your tech stack, past learnings, decisions made

# 🆕 Auto-tracked learning journal
# Every conversation automatically saves key learnings
llmswap workspace journal
# View everything you've learned in this project

# 🆕 Architecture decision log
llmswap workspace decisions
# See all technical decisions documented automatically

# View all your workspaces
llmswap workspace list

# Get wisdom and deep insights from a patient teacher
llmswap chat --mentor guru --alias "Guruji"

# High-energy motivation when you're stuck
llmswap ask "How do I debug this?" --mentor coach

# Collaborative peer learning for exploring ideas
llmswap chat --mentor friend --alias "CodeBuddy"

# Question-based learning for critical thinking
llmswap ask "Explain REST APIs" --mentor socrates

# 🆕 Use Claude Sonnet 4.5 - Best coding model
llmswap chat --provider anthropic --model claude-sonnet-4-5
# Or set as default in config for all queries

🔄 Rotate Personas to Expose Blind Spots

Industry Insight: "Rotate personas: mentor, skeptic, investor, end-user. Each lens exposes blind spots differently."

Use Case: Reviewing API Design

# Round 1: Long-term wisdom
llmswap chat --mentor guru "Design API for multi-tenant SaaS"
# Catches: scalability, technical debt, maintenance

# Round 2: Critical questions
llmswap chat --mentor socrates "Review this API design"
# Catches: assumptions, alternatives, edge cases

# Round 3: Practical execution
llmswap chat --mentor coach "What's the fastest path to v1?"
# Catches: over-engineering, paralysis by analysis

Same project context. Different perspectives. Complete understanding.

What Makes v5.1.0 Revolutionary:

🧠 Works with ANY provider - Transform Claude, GPT-4, or Gemini into your mentor
🎭 6 Teaching Personas - Guru, Coach, Friend, Socrates, Professor, Tutor
📊 Project Memory - Per-project context that persists across sessions ⭐ NEW
📚 Auto-Learning Journal - Automatically tracks what you learn ⭐ NEW
📖 Decision Tracking - Documents architecture decisions ⭐ NEW
🎓 Age-Appropriate - Explanations tailored to your level (--age 10, --age 25, etc.)
💰 Cost Optimized - Use cheaper providers for learning, premium for complex problems
🔄 Workspace Detection - Automatically loads project context ⭐ NEW

Traditional AI tools give you answers. LLMSwap v5.1.0 gives you a personalized learning journey that REMEMBERS.

🔧 MCP Integration (NEW)

The Model Context Protocol (MCP) lets LLMs connect to external tools and data sources. llmswap provides the best MCP client experience - just talk naturally, and AI handles the tools.

Natural Language MCP CLI

Connect to any MCP server and interact with tools using plain English:

# Filesystem access
llmswap-mcp --command npx -y @modelcontextprotocol/server-filesystem ~/Documents

# Then ask naturally:
> "What files are in this directory?"
> "Read the contents of report.pdf"
> "Find all files modified in the last week"

# Database queries
llmswap-mcp --command npx -y @modelcontextprotocol/server-sqlite ./mydb.sqlite

> "Show me all users in the database"
> "What are the top 10 products by sales?"

# GitHub integration
llmswap-mcp --command npx -y @modelcontextprotocol/server-github --owner anthropics --repo anthropic-sdk-python

> "Show me recent issues"
> "What pull requests are open?"

Supported MCP Transports

stdio - Local command-line tools (most common)
SSE - Server-Sent Events for remote servers
HTTP - REST API endpoints

Works With All 5 Providers

# Use your preferred LLM provider
llmswap-mcp --provider anthropic --command <mcp-server>
llmswap-mcp --provider openai --command <mcp-server>
llmswap-mcp --provider gemini --command <mcp-server>
llmswap-mcp --provider groq --command <mcp-server>    # Fastest!
llmswap-mcp --provider xai --command <mcp-server>     # Grok

Python SDK Integration

from llmswap import LLMClient

# Add MCP server to your client
client = LLMClient(provider="anthropic")
client.add_mcp_server("filesystem", command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"])

# Chat naturally - AI uses MCP tools automatically
response = client.chat("List all log files in /tmp", use_mcp=True)
print(response.content)

# List available tools
tools = client.list_mcp_tools()
for tool in tools:
    print(f"- {tool['name']}: {tool['description']}")

Popular MCP Servers

Filesystem - Read/write files and directories
GitHub - Search repos, issues, PRs
GitLab - Project management
Google Drive - Access documents
Slack - Send messages, read channels
PostgreSQL - Database queries
Brave Search - Web search
Memory - Persistent knowledge graphs

Browse all MCP servers →

MCP Features

✅ Natural language interface - No JSON, no manual tool calls ✅ Multi-turn conversations - Context preserved across queries ✅ Beautiful UI - Clean bordered interface like Claude/Factory Droids ✅ Provider-specific formatting - Optimized for each LLM ✅ Connection management - Automatic reconnection and health checks ✅ Error handling - Graceful degradation with circuit breaker

Example Use Cases

For Data Analysis:

llmswap-mcp --command npx -y @modelcontextprotocol/server-sqlite ./sales.db
> "What were our top 5 products last quarter?"
> "Show me revenue trends by region"

For Development:

llmswap-mcp --command npx -y @modelcontextprotocol/server-github --owner myorg --repo myapp
> "What issues are labeled as bugs?"
> "Summarize recent commits"

For Research:

llmswap-mcp --command npx -y @modelcontextprotocol/server-brave-search
> "Find recent papers on transformer architectures"
> "What are the latest developments in quantum computing?"

🏢 Enterprise Deployment

Remote MCP Servers (Production)

SSE Transport (Server-Sent Events)

from llmswap import LLMClient
import os

# Connect to internal MCP server via SSE
client = LLMClient(provider="anthropic")
client.add_mcp_server(
    "internal-api",
    transport="sse",
    url="https://mcp.yourcompany.com/events",
    headers={
        "Authorization": f"Bearer {os.getenv('INTERNAL_MCP_TOKEN')}"
    }
)

# Use with natural language
response = client.chat("Query internal data", use_mcp=True)

HTTP Transport (REST API)

# Connect to MCP server via HTTP
client.add_mcp_server(
    "crm-api",
    transport="http",
    url="https://api.yourcompany.com/mcp",
    headers={
        "X-API-Key": os.getenv('CRM_API_KEY')
    }
)

# Query your internal systems
response = client.chat("Get customer data for account #12345", use_mcp=True)

Production Features

Health Monitoring

# Check MCP server health
if not client.check_mcp_health("internal-api"):
    logger.error("MCP server unhealthy")
    # Fallback logic

Circuit Breaker (Built-in)

# Automatic circuit breaker prevents cascade failures
client.add_mcp_server(
    "backend-api",
    transport="sse",
    url="https://backend.company.com/mcp",
    circuit_breaker_threshold=5,  # Opens after 5 failures
    circuit_breaker_timeout=60     # Retry after 60 seconds
)

Multi-Provider Routing

Cost Optimization

# Route to cheapest provider first, fallback to premium
try:
    response = LLMClient(provider="groq").chat(query)  # Fast & cheap
except:
    response = LLMClient(provider="anthropic").chat(query)  # Premium fallback

Latency Optimization

# Route based on latency requirements
if requires_realtime:
    client = LLMClient(provider="groq")  # 840+ tokens/sec
else:
    client = LLMClient(provider="openai")  # More capable

Provider Fallback Chain

from llmswap import LLMClient

providers = ["groq", "anthropic", "openai"]  # Priority order

for provider in providers:
    try:
        client = LLMClient(provider=provider)
        response = client.chat(query)
        break
    except Exception as e:
        logger.warning(f"{provider} failed: {e}")
        continue

🔒 Security & Compliance

API Key Management

Environment Variables (Recommended)

# Never hardcode API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export INTERNAL_MCP_TOKEN="your-token"

import os
from llmswap import LLMClient

# Keys loaded from environment automatically
client = LLMClient(provider="anthropic")  # Uses ANTHROPIC_API_KEY

Secrets Management Integration

AWS Secrets Manager:

import boto3
import json
from llmswap import LLMClient

def get_secret(secret_name):
    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response['SecretString'])

secrets = get_secret('llm-api-keys')
client = LLMClient(provider="anthropic", api_key=secrets['anthropic_key'])

Azure Key Vault:

from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
from llmswap import LLMClient

credential = DefaultAzureCredential()
vault_client = SecretClient(
    vault_url="https://your-vault.vault.azure.net",
    credential=credential
)

api_key = vault_client.get_secret("anthropic-api-key").value
client = LLMClient(provider="anthropic", api_key=api_key)

HashiCorp Vault:

import hvac
from llmswap import LLMClient

vault_client = hvac.Client(url='https://vault.company.com')
vault_client.auth.approle.login(role_id=..., secret_id=...)

secret = vault_client.secrets.kv.v2.read_secret_version(path='llm-keys')
api_key = secret['data']['data']['anthropic_key']

client = LLMClient(provider="anthropic", api_key=api_key)

Data Privacy

Zero Telemetry:

LLMSwap collects NO usage data
NO analytics sent to third parties
NO phone-home behavior

Data Flow:

Your Application → LLMSwap → LLM Provider API
                              ↑
                    Your data goes ONLY here
                    (governed by provider's privacy policy)

On-Premise MCP Servers:

# All data stays within your infrastructure
client.add_mcp_server(
    "internal-db",
    transport="http",
    url="https://internal.company.local/mcp"  # Internal network only
)

Network Security

TLS/SSL Enforcement

# HTTPS enforced for remote connections
client.add_mcp_server(
    "api",
    transport="https",
    url="https://secure.company.com/mcp",
    verify_ssl=True  # Certificate verification
)

Timeout Controls

# Prevent hanging connections
client = LLMClient(
    provider="anthropic",
    timeout=30  # 30 second timeout
)

Audit Logging

import logging

# Enable detailed logging for compliance
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('llmswap')

# Logs include:
# - Provider used
# - Token usage
# - MCP tool calls
# - Error details
# - No sensitive data (keys redacted)

Compliance Notes

SOC2 / GDPR Considerations:

LLMSwap is a client library - does NOT store data
Data retention governed by your chosen LLM provider
See provider compliance: Anthropic, OpenAI, Google

Industry Standards:

Uses standard HTTPS/TLS for transport security
Supports enterprise authentication (OAuth, API keys, custom headers)
No vendor lock-in - switch providers without code changes

🐳 Production Deployment

Docker

Simple Dockerfile

FROM python:3.11-slim

# Install llmswap
RUN pip install llmswap

# Set working directory
WORKDIR /app

# Copy your application
COPY . .

# Environment variables set at runtime
ENV ANTHROPIC_API_KEY=""
ENV MCP_SERVER_URL=""

# Run your application
CMD ["python", "your_app.py"]

Multi-Stage Build (Optimized)

# Build stage
FROM python:3.11-slim as builder

RUN pip install --user llmswap

# Runtime stage
FROM python:3.11-slim

COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

WORKDIR /app
COPY . .

CMD ["python", "your_app.py"]

Docker Compose

version: '3.8'

services:
  llmswap-app:
    build: .
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - MCP_SERVER_URL=https://mcp.company.com
    networks:
      - internal
    restart: unless-stopped

networks:
  internal:
    driver: bridge

Kubernetes

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llmswap-service
  labels:
    app: llmswap
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llmswap
  template:
    metadata:
      labels:
        app: llmswap
    spec:
      containers:
      - name: llmswap
        image: your-registry/llmswap-app:latest
        env:
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: llm-secrets
              key: anthropic-key
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: llm-secrets
              key: openai-key
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Secrets Management

apiVersion: v1
kind: Secret
metadata:
  name: llm-secrets
type: Opaque
data:
  anthropic-key: <base64-encoded-key>
  openai-key: <base64-encoded-key>

# Create secrets from file
kubectl create secret generic llm-secrets \
  --from-literal=anthropic-key=$ANTHROPIC_API_KEY \
  --from-literal=openai-key=$OPENAI_API_KEY

Service

apiVersion: v1
kind: Service
metadata:
  name: llmswap-service
spec:
  selector:
    app: llmswap
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

ConfigMap (MCP Configuration)

apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-config
data:
  mcp-servers.json: |
    {
      "internal-api": {
        "transport": "sse",
        "url": "https://mcp.company.com/events"
      },
      "crm-system": {
        "transport": "http",
        "url": "https://crm-api.company.com/mcp"
      }
    }

Environment Variables Reference

# LLM Provider API Keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
GROQ_API_KEY=gsk_...
XAI_API_KEY=xai-...

# MCP Configuration
MCP_SERVER_URL=https://mcp.company.com
MCP_AUTH_TOKEN=your-token

# Optional: Override defaults
LLMSWAP_DEFAULT_PROVIDER=anthropic
LLMSWAP_TIMEOUT=30
LLMSWAP_LOG_LEVEL=INFO

Health Checks

# your_app.py
from flask import Flask, jsonify
from llmswap import LLMClient

app = Flask(__name__)
client = LLMClient(provider="anthropic")

@app.route('/health')
def health():
    """Kubernetes liveness probe"""
    return jsonify({"status": "healthy"}), 200

@app.route('/ready')
def ready():
    """Kubernetes readiness probe"""
    try:
        # Check if LLM provider is accessible
        client.chat("test", max_tokens=1)
        return jsonify({"status": "ready"}), 200
    except Exception as e:
        return jsonify({"status": "not ready", "error": str(e)}), 503

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Monitoring & Observability

Prometheus Metrics (Example)

from prometheus_client import Counter, Histogram, start_http_server
from llmswap import LLMClient

# Metrics
llm_requests = Counter('llm_requests_total', 'Total LLM requests', ['provider'])
llm_latency = Histogram('llm_request_duration_seconds', 'LLM request latency', ['provider'])
llm_errors = Counter('llm_errors_total', 'Total LLM errors', ['provider', 'error_type'])

# Start metrics endpoint
start_http_server(9090)

# Instrument your calls
client = LLMClient(provider="anthropic")
with llm_latency.labels(provider="anthropic").time():
    try:
        response = client.chat("query")
        llm_requests.labels(provider="anthropic").inc()
    except Exception as e:
        llm_errors.labels(provider="anthropic", error_type=type(e).__name__).inc()
        raise

🏆 Production-Validated with LMArena Top Models

Every model in LLMSwap's defaults comes from LMArena's top performers:

All 10 providers ship with carefully selected default models based on LMArena rankings and real-world production testing. We track arena performance and update defaults to ensure you're always using validated, battle-tested models.

Provider	Default Model	Arena Status	Why We Chose It
Anthropic	claude-sonnet-4-5	#1 Coding	Best coding model in the world (Sept 2025)
xAI	grok-4-0709	Top 5 Overall	Advanced reasoning, real-time data access
Gemini	gemini-2.0-flash-exp	Top 10	Lightning-fast, multimodal, cutting-edge
OpenAI	gpt-4o-mini	Cost Leader	Best price/performance ratio
Cohere	command-r-08-2024	Top RAG	Enterprise-grade retrieval-augmented generation
Perplexity	sonar	Web Search	Real-time web-connected AI with citations
Groq	llama-3.1-8b-instant	Speed King	840+ tokens/second ultra-fast inference
Sarvam	sarvam-m	Multilingual	24B params, best for 10 Indian languages
Watsonx	granite-3-8b-instruct	Enterprise	IBM's production-grade AI for business
Ollama	granite-code:8b	Local AI	Privacy-first, runs on your hardware

✅ Battle-tested with real API calls - Every provider validated in production, not simulated tests.

✅ Weekly model updates - We monitor LMArena rankings and deprecation notices to keep defaults current.

✅ Zero lock-in - Don't like our defaults? Override with any model: LLMClient(model="gpt-5") or llmswap config set provider.models.openai gpt-5

🔓 Use Any Model Your Provider Supports (Zero-Wait Model Support)

Here's something cool: LLMSwap doesn't restrict which models you can use. When GPT-5 or Gemini 2.5 Pro drops tomorrow, you can start using it immediately. No waiting for us to update anything.

How? We use pass-through architecture. Whatever model name you pass goes directly to your provider's API. We don't gatekeep.

CLI Examples:

# Use any OpenAI model (even ones that don't exist yet)
llmswap chat --provider openai --model gpt-5
llmswap chat --provider openai --model o3-mini

# Use any Anthropic model
llmswap chat --provider anthropic --model claude-opus-4
llmswap chat --provider anthropic --model claude-sonnet-4-5

# Use any Gemini model
llmswap chat --provider gemini --model gemini-2-5-pro
llmswap chat --provider gemini --model gemini-ultra-2

# Set as default so you don't have to type it every time
llmswap config set provider.models.openai gpt-5
llmswap config set provider.models.anthropic claude-opus-4

Python SDK:

from llmswap import LLMClient

# Use whatever model your provider offers
client = LLMClient(provider="openai", model="gpt-5")
client = LLMClient(provider="anthropic", model="claude-opus-4")
client = LLMClient(provider="gemini", model="gemini-2-5-pro")

# Model just released? Use it right now
client = LLMClient(provider="openai", model="gpt-6")  # works!

The point: You're not limited to what we've documented. If your provider supports it, llmswap supports it.

🆚 LLMSwap vs Single-Provider Tools

For Python Developers Building Apps:

Your Need	Single-Provider SDKs	LLMSwap SDK
Build chatbot/app	Import `openai` library (locked in)	Import `llmswap` (works with any provider)
Switch providers	Rewrite all API calls	Change 1 line: `provider="anthropic"`
Try different models	Sign up, new SDK, refactor code	Just change config, same code
Use new models	Wait for SDK update	Works immediately (pass-through)
Cost optimization	Manual implementation	Built-in caching (50-90% savings)
Use multiple providers	Maintain separate codebases	One codebase, switch dynamically

For Developers Using Terminal:

Your Need	Vendor CLIs	LLMSwap CLI
Have Claude subscription	Install Claude Code (Claude only)	Use llmswap (works with Claude)
Have OpenAI subscription	Build your own scripts	Use llmswap (works with OpenAI)
Have multiple subscriptions	Install 3+ different CLIs	One CLI for all subscriptions
New model launches	Wait for CLI update	Use it same day (pass-through)
Want AI to teach you	Not available	Built-in Eklavya mentorship
Switch providers mid-chat	Can't - locked in	`/switch anthropic` command

The Bottom Line:

Building an app? Use LLMSwap SDK - no vendor lock-in
Using terminal? Use LLMSwap CLI - works with your existing subscriptions
Both? Perfect - it's the same tool!

🔧 LLMSwap vs MCP Alternatives

The only multi-provider MCP client with natural language interface:

Feature	LLMSwap	langchain-mcp-tools	mcp-use	Anthropic SDK
Natural Language	✅ Ask in plain English	❌ Manual JSON	❌ Manual JSON	❌ Manual JSON
Multi-Provider MCP	✅ 11 providers	❌ LangChain only	⚠️ Limited	❌ Claude only
Latest Models	✅ Day-one support (Dec '24)	⚠️ Delayed updates	⚠️ Delayed updates	✅ Claude only
Beautiful CLI	✅ Bordered UI	❌ No CLI	❌ Basic	❌ No CLI
Setup Time	🟢 30 seconds	🔴 Hours (LangChain)	🟡 Medium	🟢 Fast
Production Ready	✅ Circuit breakers	❌ DIY	❌ DIY	⚠️ Limited
Cost Optimization	✅ Auto caching	❌ Manual	❌ Manual	❌ No
Learning Curve	🟢 10 lines	🔴 Complex	🟡 Medium	🟢 Easy
Remote MCP	✅ SSE/HTTP	⚠️ Limited	⚠️ Limited	✅ Yes
Zero Lock-in	✅ Switch providers	❌ Locked to LangChain	⚠️ Limited	❌ Claude only

Why LLMSwap for MCP?

Natural language: Just ask "List all PDFs" - no JSON schemas
Universal: Works with 11 providers, not just one
Production-ready: Circuit breakers, health checks, monitoring built-in
Latest models: Claude 3.5 Haiku, Gemini 2.0, o1 work day-one

# 🆕 NEW v5.1.0: Workspace System - Project Memory That Persists
llmswap workspace init
# Creates .llmswap/ directory with:
#   - workspace.json (project metadata)
#   - context.md (editable project description)
#   - learnings.md (auto-tracked learning journal)
#   - decisions.md (architecture decision log)

llmswap workspace list                    # View all your workspaces
llmswap workspace info                    # Show current workspace statistics
llmswap workspace journal                 # View learning journal
llmswap workspace decisions               # View decision log
llmswap workspace context                 # Edit project context

# 🆕 NEW v5.1.0: Context-Aware Mentorship
# AI mentor automatically loads project context, past learnings, and decisions
llmswap chat
# Mentor knows: your tech stack, what you've learned, decisions made

# 🆕 NEW v5.0: Age-Appropriate AI Explanations
llmswap ask "What is Docker?" --age 10
# Output: "Docker is like a magic lunch box! 🥪 When your mom packs..."

llmswap ask "What is blockchain?" --audience "business owner"
# Output: "Think of blockchain like your business ledger system..."

# 🆕 NEW v5.0: Teaching Personas & Personalization
llmswap ask "Explain Python classes" --teach --mentor developer --alias "Sarah"
# Output: "[Sarah - Senior Developer]: Here's how we handle classes in production..."

# 🆕 NEW v5.0: Conversational Chat with Provider Switching
llmswap chat --age 25 --mentor tutor
# In chat: /switch anthropic  # Switch mid-conversation
# In chat: /provider         # See current provider
# Commands: /help, /switch, /clear, /stats, /quit

# 🆕 NEW v5.0: Provider Management & Configuration
llmswap providers                    # View all providers and their status
llmswap config set provider.models.cohere command-r-plus-08-2024
llmswap config set provider.default anthropic
llmswap config show

# Code Generation (GitHub Copilot CLI Alternative)
llmswap generate "sort files by size in reverse order"
# Output: du -sh * | sort -hr

llmswap generate "Python function to read JSON with error handling" --language python
# Output: Complete Python function with try/catch blocks

# Advanced Log Analysis with AI
llmswap logs --analyze /var/log/app.log --since "2h ago"
llmswap logs --request-id REQ-12345 --correlate

# Code Review & Debugging
llmswap review app.py --focus security
llmswap debug --error "IndexError: list index out of range"

# ❌ Problem: Vendor Lock-in
import openai  # Locked to OpenAI forever
client = openai.Client(api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
# To switch to Claude? Rewrite everything.

# ✅ Solution: llmswap SDK - Universal Interface
from llmswap import LLMClient

# Works with any provider you're subscribed to
client = LLMClient()  # Auto-detects from env vars
response = client.query("Hello")

# Want Claude instead? Just change provider:
client = LLMClient(provider="anthropic")  # That's it!

# Want to try Gemini?
client = LLMClient(provider="gemini")  # Same code, different provider

# Built-in cost optimization:
# - Automatic response caching (50-90% savings)
# - Provider cost comparison
# - Smart provider selection based on query type

🆕 v5.1.0: Workspace System - Real-World Scenarios

🎯 Scenario 1: New Developer Learning Flask

Problem: Junior developer learning Flask keeps asking AI the same questions because AI forgets previous conversations.