GitHub - oso95/Agentainer-lab: A lightweight, self-hosted infrastructure layer for deploying and managing LLM agents as resilient microservices. Features automatic request persistence, crash recovery, and state management โ€” everything you need to run AI agents reliably in production without the complexity.

9 min read Original article โ†—

The Missing Infrastructure Layer for LLM Agents

License: MIT Go Version Docker PRs Welcome GitHub Stars

Status Platform Architecture

Deploy, manage, and scale LLM agents as containerized microservices with built-in resilience

๐Ÿš€ Quick Start โ€ข ๐Ÿ“– Documentation โ€ข ๐Ÿ’ก Examples โ€ข ๐Ÿ”ง CLI Reference โ€ข ๐Ÿ”Œ API


๐ŸŽฏ What is Agentainer?

Agentainer is a container runtime specifically designed for LLM agents. Just as Docker revolutionized application deployment, Agentainer makes it dead simple to deploy, manage, and scale AI agents with production-grade reliability.


๐Ÿ” How It Compares

Feature Agentainer Raw Docker Kubernetes Serverless
Deployment Speed โœ… < 30 seconds โš ๏ธ Manual setup โŒ Complex YAML โœ… Fast
State Management โœ… Built-in Redis โŒ DIY โš ๏ธ External โŒ Stateless
Request Persistence โœ… Automatic โŒ Not included โŒ Not included โŒ Lost on timeout
Crash Recovery โœ… With replay โš ๏ธ Restart only โš ๏ธ Restart only โœ… Auto-retry
Local Development โœ… Optimized โœ… Native โŒ Heavy โŒ Cloud only
LLM-Specific โœ… Purpose-built โŒ Generic โŒ Generic โŒ Generic

๐Ÿ—๏ธ Architecture

Agentainer provides a complete infrastructure layer between your agent code and container runtime. image

๐ŸŽฏ Why Choose Agentainer?

๐Ÿš€ Deploy in Seconds ๐Ÿ’ช Never Lose Data ๐Ÿ”’ Secure by Default ๐ŸŽฏ Purpose-Built
From code to running agent with one command Built-in Redis + request queuing + auto-recovery Network isolation, no direct port exposure Designed specifically for LLM agent workloads

โš ๏ธ Important Notice

PROOF-OF-CONCEPT SOFTWARE - LOCAL TESTING ONLY

This is experimental software designed for local development and concept validation.
๐Ÿšจ DO NOT USE IN PRODUCTION OR EXPOSE TO EXTERNAL NETWORKS ๐Ÿšจ

  • Demo authentication (default tokens)
  • Minimal security controls
  • Not suitable for multi-user environments
  • Requires Docker socket access

๐Ÿš€ Quick Start

Prerequisites

  • Docker (required)
  • Go 1.23+ (for building from source)
  • Git (for cloning)

Note for macOS users: When deploying from Dockerfiles, build the image first using docker build, then deploy the built image. This avoids Docker socket compatibility issues.

Installation (< 2 minutes)

# Clone and install
git clone https://github.com/oso95/Agentainer-lab.git
cd agentainer-lab
make setup    # Installs everything including prerequisites

# Start Agentainer
make run

Deploy an LLM Agent (< 1 minute)

# 1. Use the GPT example
cd examples/gpt-agent
cp .env.example .env
# Add your OpenAI API key to .env

# 2. Deploy from Dockerfile
# For macOS users: Build the image first, then deploy
docker build -t gpt-bot-image .
agentainer deploy --name gpt-bot --image gpt-bot-image

# For Linux users: Direct Dockerfile deployment works, or, build the image first, then deploy
# agentainer deploy --name gpt-bot --image ./Dockerfile

# 3. Start and test
agentainer start <agent-id>

# 4. Chat with your agent
curl -X POST http://localhost:8081/agent/<agent-id>/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello! What is Agentainer?"}'

๐Ÿ’ก Examples

Example 1: Stateful Chat Agent with Memory

View Code
# app.py - A GPT agent that remembers conversations
import os
import redis
from flask import Flask, request, jsonify

app = Flask(__name__)

# Connect to Agentainer's Redis
redis_client = redis.Redis(
    host='host.docker.internal', 
    port=6379
)

@app.route('/chat', methods=['POST'])
def chat():
    user_msg = request.json['message']
    
    # Get conversation history from Redis
    history = redis_client.lrange('conversations', 0, 5)
    
    # Call OpenAI with context
    response = openai_chat_with_history(user_msg, history)
    
    # Save to Redis for next time
    redis_client.lpush('conversations', f"User: {user_msg}")
    redis_client.lpush('conversations', f"AI: {response}")
    
    return jsonify({'response': response})
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
RUN pip install flask redis openai gunicorn
COPY app.py .
COPY .env .
EXPOSE 8000
CMD ["gunicorn", "-b", "0.0.0.0:8000", "app:app"]
# Deploy and use
agentainer deploy --name memory-bot --image ./Dockerfile
agentainer start <agent-id>

# First conversation
curl -X POST http://localhost:8081/agent/memory-bot/chat \
  -d '{"message": "My name is Alice"}'
# Response: "Nice to meet you, Alice!"

# Later conversation - it remembers!
curl -X POST http://localhost:8081/agent/memory-bot/chat \
  -d '{"message": "What is my name?"}'
# Response: "Your name is Alice."

Example 2: Multi-Agent Pipeline

View YAML Deployment
# agents.yaml - Deploy a complete LLM pipeline
apiVersion: v1
kind: AgentDeployment
metadata:
  name: llm-pipeline
spec:
  agents:
    # Agent 1: Data Collector
    - name: collector
      image: ./collector/Dockerfile
      env:
        COLLECT_INTERVAL: "60"
      volumes:
        - host: ./data
          container: /app/data
      
    # Agent 2: Processor with GPU
    - name: processor  
      image: ./processor/Dockerfile
      resources:
        memory: 4G
        cpu: 2
      env:
        MODEL: "llama2"
        
    # Agent 3: API Gateway
    - name: gateway
      image: ./gateway/Dockerfile
      healthCheck:
        endpoint: /health
        interval: 30s
      autoRestart: true
# Deploy entire pipeline
agentainer deploy --config agents.yaml

# All agents start with crash recovery
# and request persistence enabled

Example 3: Production-Ready Agent

View Production Pattern
# Resilient agent with state checkpointing
import signal
import json
import os

class ResilientAgent:
    def __init__(self):
        # Handle graceful shutdown
        signal.signal(signal.SIGTERM, self.shutdown)
        
        # Load previous state if exists
        self.checkpoint = self.load_checkpoint()
        
    def process_batch(self, items):
        for i, item in enumerate(items):
            try:
                # Process item
                result = self.process_item(item)
                
                # Save progress after each item
                self.checkpoint['last_processed'] = i
                self.checkpoint['results'].append(result)
                self.save_checkpoint()
                
            except Exception as e:
                # On error, we can resume from checkpoint
                self.handle_error(e, item)
                
    def shutdown(self, signum, frame):
        """Save state before container stops"""
        self.save_checkpoint()
        sys.exit(0)
# Deploy with persistent volume
agentainer deploy \
  --name resilient-processor \
  --image ./Dockerfile \
  --volume /data/checkpoints:/app/checkpoints \
  --auto-restart

# Even if it crashes, it resumes from checkpoint
# Agentainer replays any missed requests

๐Ÿ“– Documentation

Quick Reference

Command Description Example
deploy Deploy a new agent agentainer deploy --name my-agent --image nginx
start Start an agent agentainer start <agent-id>
stop Stop an agent agentainer stop <agent-id>
resume Resume crashed agent agentainer resume <agent-id>
list List all agents agentainer list
logs View agent logs agentainer logs <agent-id>

๐Ÿ“– Full Documentation โ†’ including:

๐Ÿ“ฌ Request Persistence

When request persistence is enabled (default), Agentainer automatically:

  1. Queues requests sent to stopped/crashed agents
  2. Replays requests when agents become available
  3. Tracks status of all requests (pending/completed/failed)
  4. Preserves requests even if agents crash mid-processing
# View pending requests for an agent
agentainer requests agent-123

# Requests are automatically replayed when you start the agent
agentainer start <agent-id>

๐Ÿฅ Health Checks

Agentainer monitors agent health and automatically restarts unhealthy agents:

  1. Configurable Endpoints: Define custom health check paths
  2. Auto-Restart: Restart agents that fail health checks
  3. Failure Tracking: Monitor consecutive failures before restart
  4. Status Monitoring: View health status via CLI or API
# View health status for all agents
agentainer health

# View health status for a specific agent
agentainer health agent-123

# Deploy with health checks
agentainer deploy --name my-agent --image my-app:latest \
  --health-endpoint /health \
  --health-interval 30s \
  --health-retries 3 \
  --auto-restart

๐Ÿ“Š Resource Monitoring (Coming Soon)

Real-time resource monitoring for all agents with historical data:

  1. CPU & Memory: Track usage and limits
  2. Network I/O: Monitor bandwidth and packet counts
  3. Disk I/O: Track read/write operations
  4. History: View up to 24 hours of metrics data
# View current resource metrics
agentainer metrics agent-123

# View metrics history (last hour)
agentainer metrics agent-123 --history

# View metrics for specific duration
agentainer metrics agent-123 --history --duration 6h

# Get metrics via API
curl http://localhost:8081/agents/agent-123/metrics \
  -H "Authorization: Bearer agentainer-default-token"

๐Ÿ’พ Backup & Restore (Coming Soon)

Complete backup solution for agent configurations and persistent data:

  1. Configuration Backup: Save agent settings, environment, and volumes
  2. Volume Data: Backup persistent volume data
  3. Selective Restore: Restore all or specific agents
  4. Export/Import: Share backups as tar.gz files
# Create a backup of all agents
agentainer backup create --name "production-backup" --description "Weekly backup"

# Backup specific agents
agentainer backup create --name "critical-agents" --agents agent-123,agent-456

# List available backups
agentainer backup list

# Restore all agents from backup
agentainer backup restore backup-1234567890

# Restore specific agents
agentainer backup restore backup-1234567890 --agents agent-123

# Export backup for archival
agentainer backup export backup-1234567890 production-backup.tar.gz

# Delete old backup
agentainer backup delete backup-1234567890

๐Ÿ“ Logging & Audit Trail (Coming Soon)

Comprehensive logging system with structured logs and audit trails:

  1. Structured Logs: JSON-formatted logs with metadata
  2. Audit Trail: Track all administrative actions
  3. Log Rotation: Automatic rotation and cleanup
  4. Real-time Access: Stream logs via Redis
  5. Filtering: Query logs by component, level, or time
# View audit logs for all actions
agentainer audit

# Filter audit logs
agentainer audit --user admin --action deploy_agent --duration 24h

# View audit logs for specific resource
agentainer audit --resource agent --duration 1h

# Export audit logs (limit results)
agentainer audit --limit 1000 > audit-export.log

Audit Events Tracked:

  • Agent deployment, start, stop, restart, removal
  • Configuration changes
  • Authentication attempts
  • API access with IP tracking
  • Resource modifications

๐Ÿ”Œ API Reference

Two Endpoints, Two Purposes

๐Ÿ”ง API Endpoints (/agents/*)

  • Manage agent lifecycle
  • Requires authentication
  • Deploy, start, stop agents
# Deploy agent
curl -X POST http://localhost:8081/agents \
  -H "Authorization: Bearer token" \
  -d '{"name": "my-agent", "image": "nginx"}'

๐ŸŒ Proxy Endpoints (/agent/*)

  • Access your agents directly
  • No authentication needed
  • Call your agent's APIs
# Chat with agent
curl -X POST http://localhost:8081/agent/my-agent/chat \
  -d '{"message": "Hello!"}'

Quick tip: "agents" (plural) = API, "agent" (singular) = Proxy

๐Ÿ“– Full API Documentation โ†’


๐Ÿ› ๏ธ Development

Quick Start Development

# Clone the repo
git clone https://github.com/oso95/Agentainer-lab.git
cd agentainer-lab

# Build and run
make build
make run

# Run tests
make test

Key Commands

make help        # Show all available commands
make setup       # Complete setup for fresh VMs
make verify      # Verify installation
make test-all    # Run all tests including integration

๐Ÿ› Troubleshooting

Common Issues
Issue Solution
Docker daemon not running Ensure Docker is running: docker ps
Redis connection failed Verify Redis: redis-cli ping
Permission denied Add user to docker group: sudo usermod -aG docker $USER
Agent not accessible Check proxy endpoint: http://localhost:8081/agent/<id>/
Requests not replaying Check persistence is enabled in config.yaml
Installation fails Run make verify to check prerequisites
"Image not found" error Build the Docker image first or use a Dockerfile path
Agent states out of sync Wait 10 seconds for auto-sync or restart server

๐Ÿค Contributing

We welcome contributions! Agentainer is in active development and we'd love your help making it better.

How to Contribute

  1. ๐Ÿ› Report Bugs: Open an issue with reproduction steps
  2. ๐Ÿ’ก Suggest Features: Start a discussion about your idea
  3. ๐Ÿ“ฆ Submit PRs: Fork, branch, code, test, and submit!
  4. ๐Ÿ“– Improve Docs: Help us make the docs clearer
  5. ๐Ÿงช Share Examples: Add your agent examples to inspire others

Development Setup

# Fork and clone
git clone https://github.com/YOUR-USERNAME/Agentainer-lab.git
cd agentainer-lab

# Create feature branch  
git checkout -b feature/amazing-feature

# Make changes and test
make test
make test-integration

# Submit PR
git push origin feature/amazing-feature

๐Ÿ‘ฅ Community & Support


๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.