How to Deploy OpenClaw on Kubernetes

12 min read Original article ↗

Security researchers have found over 135,000 OpenClaw instances sitting wide open on the internet. Many of them were vulnerable to remote code execution. The OpenClaw security crisis is real: critical CVEs, malicious skills, and a fundamental problem with how most deployments handle authentication. Running OpenClaw on a VPS with docker run is easy. Running it securely is a different problem.

Kubernetes solves that problem. You get network isolation, resource limits, automated restarts, and security defaults that would take hours to configure by hand. And with the OpenClaw Kubernetes Operator, you get all of it from a single YAML file.

This guide takes you from zero to a production-ready OpenClaw agent on Kubernetes. Every YAML block is copy-paste ready.

Why an operator

Running OpenClaw on Kubernetes is more than a Deployment and a Service. You need network isolation, secret management, persistent storage, health monitoring, config rollouts, and optionally browser automation. Wiring all of that correctly by hand is tedious and error-prone.

A Kubernetes operator encodes these concerns into a single custom resource. You declare what you want, and the operator continuously reconciles it into the right set of Kubernetes objects. That gives you:

  • Security by default. Every agent runs as UID 1000, all Linux capabilities dropped, seccomp enabled, read-only root filesystem, and a default-deny NetworkPolicy that only allows DNS and HTTPS egress. No manual hardening needed.
  • Auto-updates with rollback. The operator polls the OCI registry for new versions, backs up the workspace, rolls out the update, and automatically rolls back if the new pod fails health checks.
  • Config rollouts. Change your spec.config.raw and the operator detects the content hash changed, triggers a rolling update. Same for secret rotation.
  • Backup and restore. Automatic workspace backup to S3-compatible storage on instance deletion. Restore into a new instance from any snapshot.
  • Gateway auth. Auto-generates a gateway token per instance. No manual pairing, no mDNS (which does not work in Kubernetes anyway).
  • Drift detection. Every 5 minutes, the operator checks that every managed resource matches the desired state. If someone manually edits a NetworkPolicy or deletes a PDB, it gets reconciled back.

Prerequisites

You need:

  • A Kubernetes cluster (1.28+). Any conformant distribution works: EKS, GKE, AKS, k3s, or a local Kind cluster for testing.
  • kubectl configured to talk to your cluster.
  • helm v3 installed.
  • An API key for your AI provider (Anthropic, OpenAI, or any OpenAI-compatible endpoint).

Step 1: Install the operator

The operator ships as an OCI Helm chart. One command installs it:

helm install openclaw-operator \
  oci://ghcr.io/openclaw-rocks/charts/openclaw-operator \
  --namespace openclaw-operator-system \
  --create-namespace

Verify it is running:

kubectl get pods -n openclaw-operator-system

You should see the operator pod in Running state. The operator also installs a validating webhook that prevents insecure configurations (like running as root).

Step 2: Create your API key secret

Store your AI provider API key in a Kubernetes Secret. The operator will inject it into the agent container:

kubectl create namespace openclaw

kubectl create secret generic openclaw-api-keys \
  --namespace openclaw \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-your-key-here

For OpenAI or other providers, use the appropriate environment variable name (OPENAI_API_KEY, OPENROUTER_API_KEY, etc.). You can include multiple providers in the same Secret.

Tip: For production, consider using External Secrets Operator to sync keys from AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, or Azure Key Vault. The operator’s docs have detailed examples.

Step 3: Deploy your first agent

Create a file called my-agent.yaml:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: my-agent
  namespace: openclaw
spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys
  config:
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"
  storage:
    persistence:
      enabled: true
      size: 10Gi

Apply it:

kubectl apply -f my-agent.yaml

That single resource creates a StatefulSet, Service, ServiceAccount, Role, RoleBinding, ConfigMap, PVC, PDB, NetworkPolicy, and a gateway token Secret. The operator reconciles all of it.

Step 4: Verify it is running

Watch the instance come up:

kubectl get openclawinstances -n openclaw -w
NAME       PHASE        READY   AGE
my-agent   Provisioning False   10s
my-agent   Running      True    45s

Once the phase shows Running and Ready is True, your agent is live. Check the logs:

kubectl logs -n openclaw statefulset/my-agent -f

To interact with your agent, port-forward the gateway:

kubectl port-forward -n openclaw svc/my-agent 18789:18789

Then open http://localhost:18789 in your browser.

Step 5: Connect a channel

OpenClaw supports Telegram, Discord, WhatsApp, Signal, and other messaging channels. Each channel is configured through environment variables. Add the relevant token to your Secret:

kubectl create secret generic openclaw-channel-keys \
  --namespace openclaw \
  --from-literal=TELEGRAM_BOT_TOKEN=your-bot-token-here

Then reference it in your instance:

spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys
    - secretRef:
        name: openclaw-channel-keys

OpenClaw auto-detects the token and enables the channel. No additional config needed.


That covers the basics. Your agent is running, secured, and reachable. The rest of this guide covers optional features you can enable when you are ready.

Browser automation

OpenClaw can browse the web, take screenshots, and interact with pages. The operator makes this a one-line addition. It runs a hardened Chromium sidecar in the same pod, connected over localhost:

spec:
  chromium:
    enabled: true
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 2Gi

The operator automatically injects a CHROMIUM_URL environment variable into the main container. The sidecar runs as UID 1001 with a read-only root filesystem and its own security context.

Skills and runtime dependencies

OpenClaw skills from ClawHub can be installed declaratively. The operator runs an init container that fetches each skill before the agent starts:

spec:
  skills:
    - "@anthropic/mcp-server-fetch"
    - "@anthropic/mcp-server-filesystem"

If your skills or MCP servers need pnpm or Python, enable the built-in runtime dependency init containers:

spec:
  runtimeDeps:
    pnpm: true    # Installs pnpm via corepack
    python: true  # Installs Python 3.12 + uv

The init containers install these tools to the data PVC, so they persist across restarts without bloating the container image.

Auto-updates

OpenClaw releases new versions frequently. The operator can track these automatically, back up before updating, and roll back if something goes wrong:

spec:
  autoUpdate:
    enabled: true
    checkInterval: "12h"
    backupBeforeUpdate: true
    rollbackOnFailure: true
    healthCheckTimeout: "10m"

When a new version appears in the registry, the operator:

  1. Creates a backup of the workspace PVC to S3-compatible storage
  2. Updates the image tag on the StatefulSet
  3. Waits up to healthCheckTimeout for the pod to pass readiness checks
  4. If the pod fails to become ready, restores the previous image tag and the backup

After 3 consecutive failed rollbacks, the operator pauses auto-update and sets a condition so you can investigate.

Note: Auto-update is a no-op for digest-pinned images (spec.image.digest). If you pin by digest, you control updates manually.

Production hardening

The operator ships secure by default. Here are the additional knobs for production deployments.

Monitor with Prometheus

Enable the ServiceMonitor to scrape operator and instance metrics:

spec:
  observability:
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
        interval: "30s"

The operator exposes openclaw_reconcile_total, openclaw_reconcile_duration_seconds, openclaw_instance_phase, and auto-update counters.

Schedule on dedicated nodes

If you run a mixed cluster, use nodeSelector and tolerations to pin agents to dedicated nodes:

spec:
  availability:
    nodeSelector:
      openclaw.rocks/nodepool: openclaw
    tolerations:
      - key: openclaw.rocks/dedicated
        value: openclaw
        effect: NoSchedule

Add custom egress rules

The default NetworkPolicy only allows DNS (port 53) and HTTPS (port 443). If your agent needs to reach other services (a database, a message queue, an internal API), add egress rules:

spec:
  security:
    networkPolicy:
      additionalEgress:
        - to:
            - ipBlock:
                cidr: 10.0.0.0/8
          ports:
            - port: 5432
              protocol: TCP

Cloud provider identity

For AWS IRSA or GCP Workload Identity, annotate the managed ServiceAccount:

spec:
  security:
    rbac:
      serviceAccountAnnotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/openclaw"

Corporate proxies and private CAs

If your cluster uses a TLS-intercepting proxy, inject a CA bundle:

spec:
  security:
    caBundle:
      configMapName: corporate-ca-bundle
      key: ca-bundle.crt

The operator mounts the CA bundle into all containers and sets NODE_EXTRA_CA_CERTS automatically.

GitOps

The OpenClawInstance CRD is a plain YAML file. That means it fits directly into a GitOps workflow. Store your agent manifests in a git repo, and let ArgoCD or Flux sync them to your cluster.

A typical repo structure:

gitops/
└── agents/
    ├── kustomization.yaml
    ├── namespace.yaml
    ├── agent-a.yaml
    └── agent-b.yaml

Every change goes through a pull request. Your team reviews the diff. Merge to main, and ArgoCD applies it. No kubectl apply from laptops, no configuration drift, full audit trail.

The operator’s config hashing makes this especially smooth. When ArgoCD syncs a changed spec.config.raw, the operator detects the content hash changed and triggers a rolling update automatically. Same for secret rotation: the operator watches referenced Secrets and rolls pods when they change.

Backup and restore

The operator supports S3-compatible backups. When you delete an instance, the operator automatically creates a backup of the workspace PVC before teardown.

To restore an agent from a backup into a new instance:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: my-agent-restored
  namespace: openclaw
spec:
  restoreFrom: "s3://bucket/path/to/backup.tar.gz"
  envFrom:
    - secretRef:
        name: openclaw-api-keys
  storage:
    persistence:
      enabled: true
      size: 10Gi

The operator downloads the snapshot, unpacks it into the PVC, and starts the agent with all previous workspace data, skills, and conversation history intact.

Local inference with Ollama

If you want agents to use local models (for privacy, latency, or cost), the operator has first-class Ollama support. No need to wire up a sidecar by hand — spec.ollama handles the container, model pre-pulling, storage, and GPU allocation:

spec:
  ollama:
    enabled: true
    models:
      - "llama3.2"
      - "nomic-embed-text"
    gpu: 1
    resources:
      requests:
        cpu: "2"
        memory: 4Gi
      limits:
        cpu: "4"
        memory: 8Gi
    storage:
      sizeLimit: 30Gi

When enabled, the operator:

  1. Adds an Ollama sidecar container to the pod
  2. Runs an init container that pre-pulls the listed models before the agent starts
  3. Injects OLLAMA_HOST=http://localhost:11434 into the main container
  4. Allocates the requested NVIDIA GPU via nvidia.com/gpu resource limits

By default, models are stored in an emptyDir volume with a configurable size limit. For persistent model storage across restarts (so models are not re-pulled every time), use an existing PVC:

spec:
  ollama:
    enabled: true
    models: ["llama3.2"]
    storage:
      existingClaim: ollama-models-pvc

Tailscale integration

Expose your agent to your tailnet without Ingress, load balancers, or public IPs. The operator’s spec.tailscale field handles auth key injection, config enrichment, and NetworkPolicy rules:

spec:
  tailscale:
    enabled: true
    mode: serve       # or "funnel" for public internet access
    authKeySecretRef:
      name: tailscale-authkey
    hostname: my-agent

Create the auth key Secret with an ephemeral, reusable key from the Tailscale admin console:

kubectl create secret generic tailscale-authkey \
  --namespace openclaw \
  --from-literal=authkey=tskey-auth-...

In serve mode (the default), only members of your tailnet can reach the agent. In funnel mode, Tailscale exposes it to the public internet with automatic HTTPS.

The operator automatically:

  • Injects TS_AUTHKEY and TS_HOSTNAME environment variables
  • Merges gateway.tailscale.mode and gateway.tailscale.resetOnExit into the OpenClaw config
  • Adds STUN and WireGuard egress rules to the NetworkPolicy

For passwordless SSO login for tailnet members, enable authSSO:

spec:
  tailscale:
    enabled: true
    mode: serve
    authKeySecretRef:
      name: tailscale-authkey
    authSSO: true

This sets gateway.auth.allowTailscale=true in the OpenClaw config, so tailnet members can access the agent without a separate gateway token.

Custom sidecars and init containers

For use cases beyond the built-in Ollama and Tailscale support, the operator accepts arbitrary sidecars and init containers. Run a Cloud SQL Proxy for database access, a log forwarder, or any other helper alongside your agent:

spec:
  sidecars:
    - name: cloudsql-proxy
      image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2
      args: ["--structured-logs", "project:region:instance"]
      resources:
        requests:
          cpu: 100m
          memory: 128Mi

Custom init containers run after the operator’s own init pipeline (config seeding, pnpm, Python, skills):

spec:
  initContainers:
    - name: fetch-data
      image: curlimages/curl:8.5.0
      command: ["sh", "-c", "curl -o /data/dataset.json https://..."]
      volumeMounts:
        - name: data
          mountPath: /data

Config merge mode

By default, the operator overwrites the config file on every pod restart. If your agent modifies its own config at runtime (through skills or self-modification), set mergeMode: merge to deep-merge operator config with the existing PVC config:

spec:
  config:
    mergeMode: merge
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"

In merge mode, operator-specified keys win, but keys the agent added on its own survive restarts.

The complete example

Here is a production-ready manifest that combines everything from this guide:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: production-agent
  namespace: openclaw
spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys

  config:
    mergeMode: merge
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"

  skills:
    - "@anthropic/mcp-server-fetch"

  runtimeDeps:
    pnpm: true

  chromium:
    enabled: true
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 2Gi

  ollama:
    enabled: true
    models: ["llama3.2"]
    gpu: 1
    resources:
      requests:
        cpu: "2"
        memory: 4Gi

  tailscale:
    enabled: true
    mode: serve
    authKeySecretRef:
      name: tailscale-authkey
    authSSO: true

  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

  storage:
    persistence:
      enabled: true
      size: 10Gi

  autoUpdate:
    enabled: true
    checkInterval: "24h"
    backupBeforeUpdate: true
    rollbackOnFailure: true

  observability:
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true

  # Remove or adjust these if you don't use dedicated nodes
  availability:
    nodeSelector:
      openclaw.rocks/nodepool: openclaw
    tolerations:
      - key: openclaw.rocks/dedicated
        value: openclaw
        effect: NoSchedule

Apply it, and you have a hardened, auto-updating, browser-capable AI agent with local inference, tailnet access, monitoring, backup, and network isolation. One kubectl apply.

What you get out of the box

Without touching a single security setting, every agent deployed by the operator ships with:

  • Non-root execution (UID 1000)
  • Read-only root filesystem
  • All Linux capabilities dropped
  • Seccomp RuntimeDefault profile
  • Default-deny NetworkPolicy (DNS + HTTPS egress only)
  • Per-instance ServiceAccount with no token auto-mounting
  • PodDisruptionBudget
  • Liveness, readiness, and startup probes
  • Auto-generated gateway authentication token
  • 5-minute drift reconciliation

A validating webhook blocks attempts to run as root and warns about disabled NetworkPolicies, missing TLS on Ingress, and undetected AI provider keys.

Next steps

If you run into issues or have feedback, open an issue on GitHub. PRs are welcome too.

If you do not want to operate Kubernetes yourself, OpenClaw.rocks handles all of this for you. Pick a plan, connect a channel, and your agent is live in seconds.