GitHub - ggml-org/LlamaBarn: A cosy home for your LLMs.

LlamaBarn

LlamaBarn is a macOS menu bar app for running local LLMs.

Watch a 2-minute intro 📽️

Install

Install with brew install --cask llamabarn or download from Releases.

How it works

LlamaBarn runs a local server at http://localhost:2276/v1.

Install models — from the built-in catalog
Connect any app — chat UIs, editors, CLI tools, scripts
Models load when requested — and unload when idle

Features

100% local — Models run on your device; no data leaves your Mac
Small footprint — 12 MB native macOS app
Zero configuration — models are auto-configured with optimal settings for your Mac
Smart model catalog — shows what fits your Mac, with quantized fallbacks for what doesn't
Self-contained — all models and config stored in ~/.llamabarn (configurable)
Built on llama.cpp — from the GGML org, developed alongside llama.cpp

Works with

LlamaBarn works with any OpenAI-compatible client.

Chat UIs — Chatbox, Open WebUI, BoltAI (instructions)
Editors — VS Code, Zed, Xcode (instructions)
Editor extensions — Cline, Continue
CLI tools — OpenCode (instructions), Claude Code (instructions)
Custom scripts — curl, AI SDK, etc.

You can also use the built-in WebUI at http://localhost:2276 while LlamaBarn is running.

API examples

# list installed models
curl http://localhost:2276/v1/models

# chat with Gemma 3 4B (assuming it's installed)
curl http://localhost:2276/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma-3-4b", "messages": [{"role": "user", "content": "Hello"}]}'

Replace gemma-3-4b with any model ID from http://localhost:2276/v1/models.

See complete API reference in llama-server docs.

Experimental settings

Expose to network — By default, the server is only accessible from your Mac (localhost). This option allows connections from other devices on your local network. Only enable this if you understand the security risks.

# bind to all interfaces (0.0.0.0)
defaults write app.llamabarn.LlamaBarn exposeToNetwork -bool YES

# or bind to a specific IP (e.g., for Tailscale)
defaults write app.llamabarn.LlamaBarn exposeToNetwork -string "100.x.x.x"

# disable (default)
defaults delete app.llamabarn.LlamaBarn exposeToNetwork

Roadmap

Support for adding models outside the built-in catalog
Support for loading multiple models at the same time
Support for multiple configurations per model (e.g., multiple context lengths)