Docker Offload: Local AI Without the Laptop Meltdown

Local AI models are great. They come with absolute data privacy guarantees required for some industries and are cheap to run. But they may also be frying your laptop and result in low quality responses, issues with tool calling, etc. Docker just launched Docker Offload, a frictionless way to spin up local models in a secure cloud environment at very affordable prices.

Press enter or click to view image in full size

Docker Offload lets you spin up containers and models in the cloud instead of on your machine. You keep your Docker workflow. You keep your Compose files. But you offload the compute, including to GPUs, for as little as $0.015/min. This allows you to run bigger open source models (better quality) while staying privacy-compliant and without the need to invest $100K in hardware, or run compute-heavy ML pipelines at a very affordable price.

How it works under the hood

Once you toggle Docker Desktop mode, your containers run in the cloud, no longer on your device.
A secure SSH tunnel connects your Docker Desktop to the cloud daemon.
You pay for container uptime, not per inference call like with API services.
If you enable GPU acceleration, you get access to an NVIDIA L4.

Getting Started

You need Docker Desktop 4.43 or newer.
And sign up for Docker Offload.

You can activate or toggle Offload mode, either via the Docker Desktop UI or via terminal:

In Docker Desktop UI

Go to Settings > Beta Features > Enable Docker Offload

Press enter or click to view image in full size

Then to activate Offload (paying!), toggle the slider in the top bar from

Press enter or click to view image in full size

From CLI

docker offload start

You’ll see a welcome screen asking you to:

Pick your account (if you have more, this is the account that the credits will be counted toward)
Choose whether you need GPU support (NVIDIA L4 available)

Stopping Docker Offload

To see current status (basically: on or off):

docker offload status

You can see the actual containers you have running in the UI, as usual.

To stop Docker Offload (will stop all your containers) and stop the paying minutes:

docker offload stop

This is the status you’re looking for when stopped

Press enter or click to view image in full size

You cloud service stays up (and keeps billing) until you stop Offload mode or until it shuts down automatically after ~30 minutes of inactivity (no API calls, no port activity).

Costs (Beta Pricing)

You get 300 minutes free, afterwards
$0.015/minute with GPU
$0.01/minute without GPU

⚠️ Note: Time starts counting as soon as Offload is enabled, not just when you run a model.

Hello World

To check everything’s working:

docker run - rm hello-world

If Offload is enabled, you’ll see “Hello from Docker!” printed, but from a container running entirely in the cloud.

If you like, check your remote GPU details as follows

docker run - rm - gpus all nvidia/cuda:12.4.0-runtime-ubuntu22.04 nvidia-smi

The output should look like this

Press enter or click to view image in full size

Run a TensorFlow Notebook

Now let’s do something useful and very cool: run a TensorFlow GPU-enabled Jupyter Notebook:

The command below will automatically pull the container in the cloud and run it there. For this application, best make sure that you enable GPUs when starting Docker Offload.

docker run - rm -p 8888:8888 - gpus all tensorflow/tensorflow:latest-gpu-jupyter

Somewhere in the output you should see something like

Press enter or click to view image in full size

Open the second URL (starting with 127.0.0.1), and you’re ready to start training ML models, in the cloud, on a GPU. Here I launched a pre-configured clothing classifier, have fun!

Press enter or click to view image in full size

Offload and AI Models

Here’s where it gets extra cool for AI workflows. Docker Offload ties right into Docker’s Model Runner and new Agentic Compose integration.

Agentic Compose deserves an article on its own, but here is a sneak peek if you want to get an idea of how smooth it becomes to spin up models and run them locally or in the cloud via the usual `compose.yml` as part of a bigger app.

services:
  chat-app:
    image: my-chat-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAMEmodels:
  llm:
    model: ai/smollm2
    x-cloud-options:
      - "cloud.instance-type=gpu-small"
      - "cloud.region=us-west-2"

A few quick ending notes:

Offload is either on or off. You can’t mix local and cloud containers simultaneously.
Container images must still be uploaded to the cloud before they can run there, so large models may mean longer upload times, counting to your cloud usage.
You pay for container runtime, even if you’re not actively sending inference requests.

But in return, your laptop gets a break, and you unlock the power to run bigger models, heavier ML pipelines, and modern agentic apps, all without big hardware investments or building custom cloud infrastructure.

Docker Offload opens up new possibilities for developers who want privacy, scalability, and serious AI capabilities, in a very frictionless way.

Curious to see what you’ll build with it!