Refreshingly simple
local chat.
The omni-modal alternative to cloud AI. Automatically optimized for your GPU and NPU. Open source, community driven, and private.
Chat
What can I do with 128 GB of unified RAM?
Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.
What should I tune first?
You can increase context size to 64k or more.
Image Generation
A pitcher of lemonade in the style of a renaissance painting
Coding
Build a real-time dashboard that streams GPU metrics over WebSockets
async def stream_gpu_metrics(ws):
while True:
stats = await gpu.poll()
await ws.send_json(stats)
await asyncio.sleep(0.5)
...
Speech
Hello, I am your AI assistant. What can I do for you today?
Quickstart
Built by the community. Optimized by AMD.
Lemonade is a local AI runtime with every capability you need to build great experiences.
Automatically deploys the latest models and engines. Extra optimized for Ryzen AI, Radeon, and Strix Halo PCs.
Integrate once, deploy the <10 MB binary on any computer running Windows, Linux, or macOS.
Standard endpoints for chat, vision, image gen, image editing, speech gen, and transcription.
Open source. No strings attached. No telemetry. Customize and redistribute to your heart's content.
Works with great apps.
Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.
Specs that enable AI workflows.
Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.
One local service for every modality.
Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.
POST /api/v1/chat/completions
Always improving.
Track the newest improvements and highlights from the Lemonade release stream.