GitHub - utensils/mold: Local AI image generation CLI — FLUX, SD 1.5, SDXL & Z-Image diffusion models on your GPU

Generate images and short video clips on your own GPU. No cloud, no Python, no fuss.

Documentation | Getting Started | Models | API

mold run "a cat riding a motorcycle through neon-lit streets"

That's it. Mold auto-downloads the model on first run and saves the image to your current directory.

Install

curl -fsSL https://raw.githubusercontent.com/utensils/mold/main/install.sh | sh

This downloads the latest pre-built binary to ~/.local/bin/mold. On Linux, the installer auto-detects your NVIDIA GPU and picks the right binary (RTX 40-series or RTX 50-series). macOS builds include Metal support.

Other install methods

Nix

nix run github:utensils/mold -- run "a cat"                   # Ada / RTX 40-series
nix run github:utensils/mold#mold-sm120 -- run "a cat"        # Blackwell / RTX 50-series

From source

cargo build --release -p mold-ai --features cuda    # Linux (NVIDIA)
cargo build --release -p mold-ai --features metal   # macOS (Apple Silicon)

Add preview, expand, discord, or tui to the features list as needed.

Manual download

Pre-built binaries on the releases page.

Usage

mold run "a sunset over mountains"                    # Generate with default model
mold run flux-dev:q4 "a turtle in the desert"         # Pick a model
mold run "a portrait" --width 768 --height 1024       # Custom size
mold run "a sunset" --batch 4 --seed 42               # Batch with reproducible seeds
mold run "oil painting" --image photo.png              # img2img
mold run qwen-image-edit-2511:q4 "make the chair red leather" --image chair.png --image swatch.png
mold run ltx-video-0.9.6-distilled:bf16 "a fox in the snow" --frames 25
mold run "a cat" --expand                              # LLM prompt expansion
mold run qwen-image:q2 "a poster" --qwen2-variant q6  # Qwen-Image quantized text encoder
mold run flux-dev:bf16 "portrait" --lora style.safetensors  # LoRA adapter

Inline preview

Display generated images directly in the terminal (requires preview feature):

mold run "a cat" --preview

Generating the mold logo with --preview in Ghostty

Piping

mold run "neon cityscape" | viu -                     # Pipe to image viewer
echo "a cat" | mold run flux-schnell                  # Pipe prompt from stdin

Terminal UI (beta)

The TUI Generate view with Kitty graphics protocol image preview in Ghostty

Model management

mold list                    # See what you have
mold pull flux-dev:q4        # Download a model
mold rm dreamshaper-v8       # Remove a model
mold stats                   # Disk usage overview
mold clean                   # Clean orphaned files (dry-run)
mold clean --force           # Actually delete

Remote rendering

# On your GPU server
mold serve

# From your laptop
MOLD_HOST=http://gpu-server:7680 mold run "a cat"

See the full CLI reference, configuration guide, and model catalog in the documentation.

Models

Supports 10 model families with 80+ variants:

Family	Models	Highlights
FLUX.1	schnell, dev, + fine-tunes	Best quality, 4-25 steps, LoRA support
Flux.2 Klein	4B and 9B	Fast 4-step, low VRAM, default model
SDXL	base, turbo, + fine-tunes	Fast, flexible, negative prompts
SD 1.5	base + fine-tunes	Lightweight, ControlNet support
SD 3.5	large, medium, turbo	Triple encoder, high quality
Z-Image	turbo	Fast 9-step, Qwen3 encoder
Qwen-Image	base + 2512	High resolution, CFG guidance, GGUF quant support
Qwen-Image-Edit	2511	Multimodal image editing, repeatable `--image`, negative prompts
Wuerstchen	v2	42x latent compression
LTX Video	0.9.6, 0.9.8	Text-to-video with APNG/GIF/WebP/MP4 output

Bare names auto-resolve: mold run flux-schnell "a cat" picks the best available variant.

See the full model catalog for sizes, VRAM requirements, and recommended settings.

LTX Video

Current supported LTX checkpoints are:

ltx-video-0.9.6:bf16
ltx-video-0.9.6-distilled:bf16
ltx-video-0.9.8-2b-distilled:bf16
ltx-video-0.9.8-13b-dev:bf16
ltx-video-0.9.8-13b-distilled:bf16

Recommended default today: ltx-video-0.9.6-distilled:bf16.

The 0.9.8 models pull the required spatial-upscaler asset automatically and now run the full multiscale refinement path. mold keeps the shared T5 assets under shared/flux/..., stores the 0.9.8 spatial upscaler under shared/LTX-Video/..., and intentionally continues using the compatible LTX-Video-0.9.5 VAE source until the newer VAE layout is ported.

Features

txt2img, img2img, multimodal edit, inpainting — full generation pipeline
Image upscaling — Real-ESRGAN super-resolution (2x/4x) via mold upscale, server API, or TUI
LoRA adapters — FLUX BF16 and GGUF quantized
ControlNet — canny, depth, openpose (SD1.5)
Prompt expansion — local LLM (Qwen3-1.7B) enriches short prompts
Negative prompts — CFG-based models (SD1.5, SDXL, SD3, Wuerstchen)
Pipe-friendly — echo "a cat" | mold run | viu -
PNG metadata — embedded prompt, seed, model info
Terminal preview — Kitty, Sixel, iTerm2, halfblock
Smart VRAM — quantized encoders, block offloading, drop-and-reload
Qwen family encoder control — selectable Qwen2.5-VL variants for Qwen-Image and Qwen-Image-Edit, with quantized auto-fallback when BF16 would be too heavy
Shell completions — bash, zsh, fish, elvish, powershell
REST API — mold serve with SSE streaming, auth, rate limiting
Discord bot — slash commands with role permissions and quotas
Interactive TUI — generate, gallery, models, settings

Deployment

Method	Guide
NixOS module	Deployment: NixOS
Docker / RunPod	Deployment: Docker
Systemd	Deployment: Overview

How it works

Single Rust binary built on candle — pure Rust ML, no Python, no libtorch.

mold run "a cat"
  │
  ├─ Server running? → send request over HTTP
  │
  └─ No server? → load model locally on GPU
       ├─ Encode prompt (T5/CLIP text encoders)
       ├─ Denoise latent (transformer/UNet)
       ├─ Decode pixels (VAE)
       └─ Save PNG

Requirements

NVIDIA GPU with CUDA or Apple Silicon with Metal
Models auto-download on first use (~2-30GB depending on model)