Local AI music generation server with browser UI, powered by GGML. Describe a song, get stereo 48kHz audio. Runs on CPU, CUDA, Metal, Vulkan.
Download models
Grab one GGUF of each type from Hugging Face and drop them in the models/ folder:
https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF/tree/main
| Type | Pick one | Size |
|---|---|---|
| LM | acestep-5Hz-lm-4B-Q8_0.gguf | 4.2 GB |
| Text encoder | Qwen3-Embedding-0.6B-Q8_0.gguf | 748 MB |
| DiT | acestep-v15-turbo-Q8_0.gguf | 2.4 GB |
| VAE | vae-BF16.gguf (always this one) | 322 MB |
Three LM sizes available: 0.6B (fast), 1.7B, 4B (best quality). Multiple DiT variants: turbo (8 steps), sft (50 steps, higher quality), base, shift1, shift3, continuous.
Alternative: ./models.sh downloads the default set automatically (needs pip install hf).
Build
git clone --recurse-submodules https://github.com/Serveurperso/acestep.cpp
cd acestep.cpp
Windows
Pre-built binaries (until CI is set up): https://www.serveurperso.com/temp/acestep.cpp-win64/
To build from source, install Visual C++ Build Tools (select "Desktop development with C++" workload) and optionally the CUDA Toolkit and/or the Vulkan SDK.
buildcuda.cmd # NVIDIA GPU buildvulkan.cmd # AMD/Intel GPU (Vulkan) buildall.cmd # all backends (CUDA + Vulkan + CPU, runtime loading)
Linux / macOS
./buildcuda.sh # NVIDIA GPU ./buildvulkan.sh # AMD/Intel GPU (Vulkan) ./buildcpu.sh # CPU only (with BLAS) ./buildall.sh # all backends (CUDA + Vulkan + CPU, runtime loading)
macOS auto-enables Metal and Accelerate BLAS with any of the above.
Run
./server.sh # Linux / macOS server.cmd # Windows
Open http://localhost:8085 in your browser. The WebUI handles everything: write a caption, set lyrics and metadata, generate, play, and download tracks.
Models are loaded on first request (zero GPU at startup) and swapped automatically when you pick a different one in the UI.
LoRA
Drop LoRA adapters in the loras/ folder and restart the server.
Supports PEFT directories and ComfyUI single .safetensors files.
Select the active LoRA from the WebUI.
Server options
--models <dir> Model directory (required)
--loras <dir> LoRA adapters directory
--host <addr> Listen address (default: 127.0.0.1)
--port <N> Listen port (default: 8080)
--max-batch <N> LM batch limit 1-9 (default: 1)
--vae-chunk <N> VAE tile size (default: 256, lower = less VRAM)
--mp3-bitrate <N> MP3 kbps (default: 128)
API endpoints
The server exposes three POST endpoints and two GET endpoints:
POST /lm - Generate lyrics and audio codes from a caption. Returns JSON.
POST /synth - Render audio codes into MP3 or WAV (?wav=1).
Accepts JSON or multipart (with source audio for cover/repaint modes).
POST /understand - Reverse pipeline: audio in, metadata + lyrics + codes out. Accepts multipart (audio file) or JSON (codes-only).
GET /health - Returns {"status":"ok"}.
GET /props - Available models, server config, default parameters.
See docs/ARCHITECTURE.md for the full API reference and AceRequest JSON specification.
CLI tools (advanced)
For scripting without the server, ace-lm and ace-synth work as a pipe:
# LM generates lyrics + codes ./build/ace-lm \ --request /tmp/request.json \ --lm models/acestep-5Hz-lm-4B-Q8_0.gguf # DiT + VAE render to audio ./build/ace-synth \ --request /tmp/request0.json \ --embedding models/Qwen3-Embedding-0.6B-Q8_0.gguf \ --dit models/acestep-v15-turbo-Q8_0.gguf \ --vae models/vae-BF16.gguf
See docs/ARCHITECTURE.md for the full JSON reference, task types, batching, and understand pipeline.
Technical documentation
docs/ARCHITECTURE.md covers the complete AceRequest JSON reference, all task types (text2music, cover, repaint, lego, extract, complete), FSM constrained decoding, custom GGML operators, quantization, and architecture internals.
Community
ACE-Step official documentation
- A Musician's Guide - non-technical guide for music makers
- Tutorial - design philosophy, model architecture, input control, inference hyperparameters
Third-party UIs for acestep.cpp
Samples
GGML.mp4
DiT-Only-SFT.mp4
ProcessJellyfin.mp4
Instrumental.mp4
House-IA.mp4
Acknowledgements
Independent C++ implementation based on ACE-Step 1.5 by ACE Studio and StepFun. All model weights are theirs, this is just a native backend.
@misc{gong2026acestep, title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation}, author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}}, year={2026}, note={GitHub repository} }