GitHub - oso95/Agentainer-lab: A lightweight, self-hosted infrastructure layer for deploying and managing LLM agents as resilient microservices. Features automatic request persistence, crash recovery, and state management — everything you need to run AI agents reliably in production without the complexity.

The Missing Infrastructure Layer for LLM Agents

Deploy, manage, and scale LLM agents as containerized microservices with built-in resilience

🚀 Quick Start • 📖 Documentation • 💡 Examples • 🔧 CLI Reference • 🔌 API

🎯 What is Agentainer?

Agentainer is a container runtime specifically designed for LLM agents. Just as Docker revolutionized application deployment, Agentainer makes it dead simple to deploy, manage, and scale AI agents with production-grade reliability.

🔍 How It Compares

Feature	Agentainer	Raw Docker	Kubernetes	Serverless
Deployment Speed	✅ < 30 seconds	⚠️ Manual setup	❌ Complex YAML	✅ Fast
State Management	✅ Built-in Redis	❌ DIY	⚠️ External	❌ Stateless
Request Persistence	✅ Automatic	❌ Not included	❌ Not included	❌ Lost on timeout
Crash Recovery	✅ With replay	⚠️ Restart only	⚠️ Restart only	✅ Auto-retry
Local Development	✅ Optimized	✅ Native	❌ Heavy	❌ Cloud only
LLM-Specific	✅ Purpose-built	❌ Generic	❌ Generic	❌ Generic

🏗️ Architecture

Agentainer provides a complete infrastructure layer between your agent code and container runtime.

🎯 Why Choose Agentainer?

🚀 Deploy in Seconds	💪 Never Lose Data	🔒 Secure by Default	🎯 Purpose-Built
From code to running agent with one command	Built-in Redis + request queuing + auto-recovery	Network isolation, no direct port exposure	Designed specifically for LLM agent workloads

⚠️ Important Notice

PROOF-OF-CONCEPT SOFTWARE - LOCAL TESTING ONLY

This is experimental software designed for local development and concept validation.
🚨 DO NOT USE IN PRODUCTION OR EXPOSE TO EXTERNAL NETWORKS 🚨

Demo authentication (default tokens)

Minimal security controls

Not suitable for multi-user environments

Requires Docker socket access

🚀 Quick Start

Prerequisites

Docker (required)
Go 1.23+ (for building from source)
Git (for cloning)

Note for macOS users: When deploying from Dockerfiles, build the image first using docker build, then deploy the built image. This avoids Docker socket compatibility issues.

Installation (< 2 minutes)

# Clone and install
git clone https://github.com/oso95/Agentainer-lab.git
cd agentainer-lab
make setup    # Installs everything including prerequisites

# Start Agentainer
make run

Deploy an LLM Agent (< 1 minute)

# 1. Use the GPT example
cd examples/gpt-agent
cp .env.example .env
# Add your OpenAI API key to .env

# 2. Deploy from Dockerfile
# For macOS users: Build the image first, then deploy
docker build -t gpt-bot-image .
agentainer deploy --name gpt-bot --image gpt-bot-image

# For Linux users: Direct Dockerfile deployment works, or, build the image first, then deploy
# agentainer deploy --name gpt-bot --image ./Dockerfile

# 3. Start and test
agentainer start <agent-id>

# 4. Chat with your agent
curl -X POST http://localhost:8081/agent/<agent-id>/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello! What is Agentainer?"}'

💡 Examples

Example 1: Stateful Chat Agent with Memory

View Code

# app.py - A GPT agent that remembers conversations
import os
import redis
from flask import Flask, request, jsonify

app = Flask(__name__)

# Connect to Agentainer's Redis
redis_client = redis.Redis(
    host='host.docker.internal', 
    port=6379
)

@app.route('/chat', methods=['POST'])
def chat():
    user_msg = request.json['message']
    
    # Get conversation history from Redis
    history = redis_client.lrange('conversations', 0, 5)
    
    # Call OpenAI with context
    response = openai_chat_with_history(user_msg, history)
    
    # Save to Redis for next time
    redis_client.lpush('conversations', f"User: {user_msg}")
    redis_client.lpush('conversations', f"AI: {response}")
    
    return jsonify({'response': response})

# Dockerfile
FROM python:3.11-slim
WORKDIR /app
RUN pip install flask redis openai gunicorn
COPY app.py .
COPY .env .
EXPOSE 8000
CMD ["gunicorn", "-b", "0.0.0.0:8000", "app:app"]

# Deploy and use
agentainer deploy --name memory-bot --image ./Dockerfile
agentainer start <agent-id>

# First conversation
curl -X POST http://localhost:8081/agent/memory-bot/chat \
  -d '{"message": "My name is Alice"}'
# Response: "Nice to meet you, Alice!"

# Later conversation - it remembers!
curl -X POST http://localhost:8081/agent/memory-bot/chat \
  -d '{"message": "What is my name?"}'
# Response: "Your name is Alice."

Example 2: Multi-Agent Pipeline

View YAML Deployment

# agents.yaml - Deploy a complete LLM pipeline
apiVersion: v1
kind: AgentDeployment
metadata:
  name: llm-pipeline
spec:
  agents:
    # Agent 1: Data Collector
    - name: collector
      image: ./collector/Dockerfile
      env:
        COLLECT_INTERVAL: "60"
      volumes:
        - host: ./data
          container: /app/data
      
    # Agent 2: Processor with GPU
    - name: processor  
      image: ./processor/Dockerfile
      resources:
        memory: 4G
        cpu: 2
      env:
        MODEL: "llama2"
        
    # Agent 3: API Gateway
    - name: gateway
      image: ./gateway/Dockerfile
      healthCheck:
        endpoint: /health
        interval: 30s
      autoRestart: true

# Deploy entire pipeline
agentainer deploy --config agents.yaml

# All agents start with crash recovery
# and request persistence enabled

Example 3: Production-Ready Agent

View Production Pattern

# Resilient agent with state checkpointing
import signal
import json
import os

class ResilientAgent:
    def __init__(self):
        # Handle graceful shutdown
        signal.signal(signal.SIGTERM, self.shutdown)
        
        # Load previous state if exists
        self.checkpoint = self.load_checkpoint()
        
    def process_batch(self, items):
        for i, item in enumerate(items):
            try:
                # Process item
                result = self.process_item(item)
                
                # Save progress after each item
                self.checkpoint['last_processed'] = i
                self.checkpoint['results'].append(result)
                self.save_checkpoint()
                
            except Exception as e:
                # On error, we can resume from checkpoint
                self.handle_error(e, item)
                
    def shutdown(self, signum, frame):
        """Save state before container stops"""
        self.save_checkpoint()
        sys.exit(0)

# Deploy with persistent volume
agentainer deploy \
  --name resilient-processor \
  --image ./Dockerfile \
  --volume /data/checkpoints:/app/checkpoints \
  --auto-restart

# Even if it crashes, it resumes from checkpoint
# Agentainer replays any missed requests

📖 Documentation

Quick Reference

Command	Description	Example
`deploy`	Deploy a new agent	`agentainer deploy --name my-agent --image nginx`
`start`	Start an agent	`agentainer start <agent-id>`
`stop`	Stop an agent	`agentainer stop <agent-id>`
`resume`	Resume crashed agent	`agentainer resume <agent-id>`
`list`	List all agents	`agentainer list`
`logs`	View agent logs	`agentainer logs <agent-id>`

📖 Full Documentation → including:

CLI Reference - All commands and options
Deployment Guide - Advanced deployment patterns
Building Resilient Agents - Production patterns
API Endpoints - REST API reference
Network Architecture - Networking details

📬 Request Persistence

When request persistence is enabled (default), Agentainer automatically:

Queues requests sent to stopped/crashed agents
Replays requests when agents become available
Tracks status of all requests (pending/completed/failed)
Preserves requests even if agents crash mid-processing

# View pending requests for an agent
agentainer requests agent-123

# Requests are automatically replayed when you start the agent
agentainer start <agent-id>

🏥 Health Checks

Agentainer monitors agent health and automatically restarts unhealthy agents:

Configurable Endpoints: Define custom health check paths
Auto-Restart: Restart agents that fail health checks
Failure Tracking: Monitor consecutive failures before restart
Status Monitoring: View health status via CLI or API

# View health status for all agents
agentainer health

# View health status for a specific agent
agentainer health agent-123

# Deploy with health checks
agentainer deploy --name my-agent --image my-app:latest \
  --health-endpoint /health \
  --health-interval 30s \
  --health-retries 3 \
  --auto-restart

📊 Resource Monitoring (Coming Soon)

Real-time resource monitoring for all agents with historical data:

CPU & Memory: Track usage and limits
Network I/O: Monitor bandwidth and packet counts
Disk I/O: Track read/write operations
History: View up to 24 hours of metrics data

# View current resource metrics
agentainer metrics agent-123

# View metrics history (last hour)
agentainer metrics agent-123 --history

# View metrics for specific duration
agentainer metrics agent-123 --history --duration 6h

# Get metrics via API
curl http://localhost:8081/agents/agent-123/metrics \
  -H "Authorization: Bearer agentainer-default-token"

💾 Backup & Restore (Coming Soon)

Complete backup solution for agent configurations and persistent data:

Configuration Backup: Save agent settings, environment, and volumes
Volume Data: Backup persistent volume data
Selective Restore: Restore all or specific agents
Export/Import: Share backups as tar.gz files

# Create a backup of all agents
agentainer backup create --name "production-backup" --description "Weekly backup"

# Backup specific agents
agentainer backup create --name "critical-agents" --agents agent-123,agent-456

# List available backups
agentainer backup list

# Restore all agents from backup
agentainer backup restore backup-1234567890

# Restore specific agents
agentainer backup restore backup-1234567890 --agents agent-123

# Export backup for archival
agentainer backup export backup-1234567890 production-backup.tar.gz

# Delete old backup
agentainer backup delete backup-1234567890

📝 Logging & Audit Trail (Coming Soon)

Comprehensive logging system with structured logs and audit trails:

Structured Logs: JSON-formatted logs with metadata
Audit Trail: Track all administrative actions
Log Rotation: Automatic rotation and cleanup
Real-time Access: Stream logs via Redis
Filtering: Query logs by component, level, or time

# View audit logs for all actions
agentainer audit

# Filter audit logs
agentainer audit --user admin --action deploy_agent --duration 24h

# View audit logs for specific resource
agentainer audit --resource agent --duration 1h

# Export audit logs (limit results)
agentainer audit --limit 1000 > audit-export.log

Audit Events Tracked:

Agent deployment, start, stop, restart, removal
Configuration changes
Authentication attempts
API access with IP tracking
Resource modifications

🔌 API Reference

Two Endpoints, Two Purposes

🔧 API Endpoints (/agents/*)

Manage agent lifecycle
Requires authentication
Deploy, start, stop agents

# Deploy agent
curl -X POST http://localhost:8081/agents \
  -H "Authorization: Bearer token" \
  -d '{"name": "my-agent", "image": "nginx"}'

🌐 Proxy Endpoints (/agent/*)

Access your agents directly
No authentication needed
Call your agent's APIs

# Chat with agent
curl -X POST http://localhost:8081/agent/my-agent/chat \
  -d '{"message": "Hello!"}'

Quick tip: "agents" (plural) = API, "agent" (singular) = Proxy

📖 Full API Documentation →

🛠️ Development

Quick Start Development

# Clone the repo
git clone https://github.com/oso95/Agentainer-lab.git
cd agentainer-lab

# Build and run
make build
make run

# Run tests
make test

Key Commands

make help        # Show all available commands
make setup       # Complete setup for fresh VMs
make verify      # Verify installation
make test-all    # Run all tests including integration

🐛 Troubleshooting

Common Issues

Issue	Solution
Docker daemon not running	Ensure Docker is running: `docker ps`
Redis connection failed	Verify Redis: `redis-cli ping`
Permission denied	Add user to docker group: `sudo usermod -aG docker $USER`
Agent not accessible	Check proxy endpoint: `http://localhost:8081/agent/<id>/`
Requests not replaying	Check persistence is enabled in config.yaml
Installation fails	Run `make verify` to check prerequisites
"Image not found" error	Build the Docker image first or use a Dockerfile path
Agent states out of sync	Wait 10 seconds for auto-sync or restart server

🤝 Contributing

We welcome contributions! Agentainer is in active development and we'd love your help making it better.

How to Contribute

🐛 Report Bugs: Open an issue with reproduction steps
💡 Suggest Features: Start a discussion about your idea
📦 Submit PRs: Fork, branch, code, test, and submit!
📖 Improve Docs: Help us make the docs clearer
🧪 Share Examples: Add your agent examples to inspire others

Development Setup

# Fork and clone
git clone https://github.com/YOUR-USERNAME/Agentainer-lab.git
cd agentainer-lab

# Create feature branch  
git checkout -b feature/amazing-feature

# Make changes and test
make test
make test-integration

# Submit PR
git push origin feature/amazing-feature

👥 Community & Support

💬 Discord: Join our community
📧 Email: cyw@cywang.me

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.