GitHub - run-house/kubetorch: Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

A Fast, Pythonic, "Serverless" Interface for Running ML Workloads on Kubernetes

Kubetorch lets you programmatically build, iterate, and deploy ML applications on Kubernetes at any scale - directly from Python.

It brings your cluster's compute power into your local development environment, enabling extremely fast iteration (1-2 seconds). Logs, exceptions, and hardware faults are automatically propagated back to you in real-time.

Since Kubetorch has no local runtime or code serialization, you can access large-scale cluster compute from any Python environment - your IDE, notebooks, CI pipelines, or production code - just like you would use a local process pool.

Hello World

import kubetorch as kt

def hello_world():
    return "Hello from Kubetorch!"

if __name__ == "__main__":
    # Define your compute
    compute = kt.Compute(cpus=".1")

    # Send local function to freshly launched remote compute
    remote_hello = kt.fn(hello_world).to(compute)

    # Runs remotely on your Kubernetes cluster
    result = remote_hello()
    print(result)  # "Hello from Kubetorch!"

What Kubetorch Enables

100x faster iteration from 10+ minutes to 1-3 seconds for complex ML applications like RL and distributed training
50%+ compute cost savings through intelligent resource allocation, bin-packing, and dynamic scaling
95% fewer production faults with built-in fault handling with programmatic error recovery and resource adjustment

Installation

1. Python Client

pip install "kubetorch[client]"

2. Kubernetes Deployment (Helm)

# Option 1: Install directly from OCI registry
helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \
  --version 0.5.0 -n kubetorch --create-namespace

# Option 2: Download chart locally first
helm pull oci://ghcr.io/run-house/charts/kubetorch --version 0.5.0 --untar
helm upgrade --install kubetorch ./kubetorch -n kubetorch --create-namespace

For detailed setup instructions, see our Installation Guide.

Source Layout

This repo now includes the customer-facing OSS deployment components that were previously split across internal and OSS repos:

python_client/ for the SDK
charts/kubetorch/ for the Helm chart
services/ for the controller and data store sources
release/default_images/ for the workload base images
release/ for release scripts and version sync

Kubetorch Serverless

Learn More

Documentation - API Reference, concepts, and guides
Examples - Real-world usage patterns and tutorials
Join our Slack - Connect with the community and get support

Apache 2.0 License

🏃‍♀️ Built by Runhouse 🏠