Local AI for Text, Images, and Speech

2 min read Original article ↗

Refreshingly simple
local chat.

The omni-modal alternative to cloud AI. Automatically optimized for your GPU and NPU. Open source, community driven, and private.

Chat

What can I do with 128 GB of unified RAM?

Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.

What should I tune first?

You can increase context size to 64k or more.

Image Generation

A pitcher of lemonade in the style of a renaissance painting

Coding

Build a real-time dashboard that streams GPU metrics over WebSockets

async def stream_gpu_metrics(ws):
    while True:
        stats = await gpu.poll()
        await ws.send_json(stats)
        await asyncio.sleep(0.5)
...

Speech

Hello, I am your AI assistant. What can I do for you today?

Quickstart

Built by the community. Optimized by AMD.

Lemonade is a local AI runtime with every capability you need to build great experiences.

Automatically deploys the latest models and engines. Extra optimized for Ryzen AI, Radeon, and Strix Halo PCs.

Explore Models

Integrate once, deploy the <10 MB binary on any computer running Windows, Linux, or macOS.

Embed in Your App

Standard endpoints for chat, vision, image gen, image editing, speech gen, and transcription.

Read Endpoints Spec

Open source. No strings attached. No telemetry. Customize and redistribute to your heart's content.

Visit the GitHub

Works with great apps.

Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.

Specs that enable AI workflows.

Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.

One local service for every modality.

Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.

POST /api/v1/chat/completions

Always improving.

Track the newest improvements and highlights from the Lemonade release stream.