Local AI models are great. They come with absolute data privacy guarantees required for some industries and are cheap to run. But they may also be frying your laptop and result in low quality responses, issues with tool calling, etc. Docker just launched Docker Offload, a frictionless way to spin up local models in a secure cloud environment at very affordable prices.
Press enter or click to view image in full size
Docker Offload lets you spin up containers and models in the cloud instead of on your machine. You keep your Docker workflow. You keep your Compose files. But you offload the compute, including to GPUs, for as little as $0.015/min. This allows you to run bigger open source models (better quality) while staying privacy-compliant and without the need to invest $100K in hardware, or run compute-heavy ML pipelines at a very affordable price.
How it works under the hood
- Once you toggle Docker Desktop mode, your containers run in the cloud, no longer on your device.
- A secure SSH tunnel connects your Docker Desktop to the cloud daemon.
- You pay for container uptime, not per inference call like with API services.
- If you enable GPU acceleration, you get access to an NVIDIA L4.
Getting Started
You need Docker Desktop 4.43 or newer.
And sign up for Docker Offload.
You can activate or toggle Offload mode, either via the Docker Desktop UI or via terminal:
In Docker Desktop UI
Go to Settings > Beta Features > Enable Docker Offload
Press enter or click to view image in full size
Then to activate Offload (paying!), toggle the slider in the top bar from
Press enter or click to view image in full size
to
Press enter or click to view image in full size
From CLI
docker offload startYou’ll see a welcome screen asking you to:
- Pick your account (if you have more, this is the account that the credits will be counted toward)
- Choose whether you need GPU support (NVIDIA L4 available)
Stopping Docker Offload
To see current status (basically: on or off):
docker offload statusYou can see the actual containers you have running in the UI, as usual.
To stop Docker Offload (will stop all your containers) and stop the paying minutes:
docker offload stopThis is the status you’re looking for when stopped
Press enter or click to view image in full size
You cloud service stays up (and keeps billing) until you stop Offload mode or until it shuts down automatically after ~30 minutes of inactivity (no API calls, no port activity).
Costs (Beta Pricing)
- You get 300 minutes free, afterwards
- $0.015/minute with GPU
- $0.01/minute without GPU
⚠️ Note: Time starts counting as soon as Offload is enabled, not just when you run a model.
Hello World
To check everything’s working:
docker run - rm hello-worldIf Offload is enabled, you’ll see “Hello from Docker!” printed, but from a container running entirely in the cloud.
If you like, check your remote GPU details as follows
docker run - rm - gpus all nvidia/cuda:12.4.0-runtime-ubuntu22.04 nvidia-smiThe output should look like this
Press enter or click to view image in full size
Run a TensorFlow Notebook
Now let’s do something useful and very cool: run a TensorFlow GPU-enabled Jupyter Notebook:
The command below will automatically pull the container in the cloud and run it there. For this application, best make sure that you enable GPUs when starting Docker Offload.
docker run - rm -p 8888:8888 - gpus all tensorflow/tensorflow:latest-gpu-jupyterSomewhere in the output you should see something like
Press enter or click to view image in full size
Open the second URL (starting with 127.0.0.1), and you’re ready to start training ML models, in the cloud, on a GPU. Here I launched a pre-configured clothing classifier, have fun!
Press enter or click to view image in full size
Offload and AI Models
Here’s where it gets extra cool for AI workflows. Docker Offload ties right into Docker’s Model Runner and new Agentic Compose integration.
Agentic Compose deserves an article on its own, but here is a sneak peek if you want to get an idea of how smooth it becomes to spin up models and run them locally or in the cloud via the usual `compose.yml` as part of a bigger app.
services:
chat-app:
image: my-chat-app
models:
llm:
endpoint_var: AI_MODEL_URL
model_var: AI_MODEL_NAMEmodels:
llm:
model: ai/smollm2
x-cloud-options:
- "cloud.instance-type=gpu-small"
- "cloud.region=us-west-2"
A few quick ending notes:
- Offload is either on or off. You can’t mix local and cloud containers simultaneously.
- Container images must still be uploaded to the cloud before they can run there, so large models may mean longer upload times, counting to your cloud usage.
- You pay for container runtime, even if you’re not actively sending inference requests.
But in return, your laptop gets a break, and you unlock the power to run bigger models, heavier ML pipelines, and modern agentic apps, all without big hardware investments or building custom cloud infrastructure.
Docker Offload opens up new possibilities for developers who want privacy, scalability, and serious AI capabilities, in a very frictionless way.
Curious to see what you’ll build with it!