GitHub - mewmix/nabu: A multi engine TTS & LLM edge computing playground with audio book features and more!

Nabu is an on-device test bench for TTS and chat:

ONNX Runtime (NNAPI/CPU) TTS with Kokoro-82M v1.0, Supertonic v3, Supertonic v2, and Soprano 1.1 (80M) via soprano-onnx
On-device LLM chat with LiteRT .task models (MediaPipe runtime) and experimental .gguf support via llama.cpp
E-reader and long-form playback workflows

Demo Video

Watch the demo video

Screenshots

Playground Workflows

TTS workbench: switch engines (kokoro, supertonic, soprano) and compare runtime behavior on-device.
LLM workbench: run local chat models from managed LiteRT .task downloads or imported .gguf files.
Book workflow: open documents, edit text, save projects/bookmarks, and pre-generate per-line WAV audio for offline playback.
Chat + TTS loop: generate responses with local LLMs and speak them through the active TTS engine.

TTS Engines Integrated

Kokoro

Runtime: ONNX Runtime (NNAPI when available, CPU fallback)
Credits chain:
- Original Kokoro model: https://huggingface.co/hexgrad/Kokoro-82M
- ONNX conversion/runtime reference: https://github.com/thewh1teagle/kokoro-onnx
- Original Android Kokoro app base: https://github.com/puff-dayo/Kokoro-82M-Android

Supertonic (v2/v3 ONNX)

Runtime: ONNX Runtime (CPU)
Integrated model ids in app: supertonic-3-onnx, supertonic-2-onnx
Credits chain:
- Original Supertonic project: https://github.com/supertonic-tts/supertonic
- Supertonic v3 ONNX packaging/distribution: https://huggingface.co/Supertone/supertonic-3
- Supertonic v2 ONNX packaging/distribution: https://huggingface.co/Supertone/supertonic-2

Soprano (80M ONNX)

Runtime: ONNX Runtime (CPU)
Integrated model id in app: soprano-80m-onnx
Credits chain:
- Original Soprano repo and reference inference: https://github.com/ekwek1/soprano
- ONNX web reference implementation used for behavior parity debugging: https://github.com/KevinAHM/soprano-web-onnx
- ONNX packaging/distribution used by app downloader: https://huggingface.co/KevinAHM/soprano-onnx

Model Artifacts and Sources

Source manifests used by the app:

app/src/main/java/com/mewmix/nabu/kokoro/Manifest.kt
app/src/main/res/raw/model_allowlist.json

TTS Models

Model	ID	Source
Kokoro v1.0 (FP16/INT8)	`kokoro_fp16`, `kokoro_int8`	ONNX fp16, INT8 release
Supertonic v3	`supertonic-3-onnx`	Hugging Face
Supertonic v2	`supertonic-2-onnx`	Hugging Face
Soprano 1.1 (ONNX pkg)	`soprano-80m-onnx`	Original model, ONNX packaging

LLM Models (`.task`, `.litertlm`)

Model	ID	Source	Access
Gemma 3n IT 4B int4	`gemma-3n-E4B-it-int4`	Hugging Face	gated
Gemma3 1B IT q4	`gemma3-1b-it-q4`	Hugging Face	public
Gemma3 270M IT q8	`gemma3-270m-it-q8`	Hugging Face	gated in allowlist
Gemma 4 E2B IT	`gemma-4-E2B-it`	Hugging Face	public
Qwen2.5 1.5B Instruct q8	`qwen2.5-1.5b-instruct-q8`	Hugging Face	public

Experimental GGUF Support

Status: experimental local-import path for LLMs.
Import flow: Models screen accepts LiteRT .task, LiteRT-LM .litertlm, and .gguf files via file picker.
Storage path: imported GGUF files are copied to files/models/<model-id>.gguf.
Backend routing: imported .gguf models are tagged as backend llama and loaded through LlamaCppBackend.
Current limits:
- No allowlist downloader for GGUF (manual import only).
- No remote size metadata/checksum flow for GGUF.
- TTS engines remain ONNX-based (kokoro, supertonic, soprano); GGUF is not used for TTS inference.

Audiobook Workflow File Types

Type	Format(s)	Used for
Book input	`.epub` (`application/epub+zip`)	Full book/document ingestion
Book input	`.pdf` (`application/pdf`)	Page text extraction and playback
Book input	`.txt`, `text/*`	Plain text ingestion and playback
Edited book output	`.epub`	Save edited copy from the in-app editor
Pre-generated audio cache	`.wav`	Per-line cache in `files/pregenerated/...`
User audio export	`.wav`	Saved audio clips to Android `Music/`

Unknown/other file types fall back to plain text extraction.

Persistence and Conversation Database

Local DB: kokoro.db (SQLite).
Chat conversations:
- Table: conversations
- Stores: title, model_id, serialized messages JSON, created_at, updated_at
Audiobook/project state:
- Table: projects (URI, project name, style mix, speed, bookmark line, pregen path, pregen toggle)
Table: audio_lines (per-line cached WAV file path by document URI + line index)
Result: chat history, selected model linkage, project settings, bookmarks, and pre-generated line audio survive app restarts.

Local API Server

Nabu includes an opt-in local REST API server for on-device inference, exposing both text-to-speech and an OpenAI-compatible /v1/chat/completions endpoint.

Default bind: 127.0.0.1:8455
Optional LAN bind: 0.0.0.0:8455 (enable in Settings)
Security note: there is no API auth layer yet; use LAN exposure only on trusted networks.

Enable it from Settings:

Enable API Server
Expose API on LAN (optional)

Agentic Tool Calling (OpenCode & Open Interpreter)

Nabu fully supports the OpenAI tools specification for agentic function calling over its local API. You can direct robust tooling environments like OpenCode and Open Interpreter to use Nabu as their LLM backend.

Nabu intercepts the system tool prompts, parses <tool_call> outputs efficiently, and maps them to standard JSON {"finish_reason": "tool_calls"} stream chunks.

Fuzzy Tool Context Injection

Nabu employs a fuzzy matching strategy to minimize prompt bloat. Instead of injecting all available tools into every request, it analyzes the user's query and the conversation context to dynamically inject only the most relevant tool definitions.

list_tools for On-Demand Discovery

Models can also utilize the list_tools function to discover available capabilities at runtime. This allows for a hierarchical discovery flow where the model first identifies relevant tools before asking for their full schemas.

Glaive File Manager & Local Tools

If you install the Glaive File Manager alongside Nabu, you can grant Nabu direct tool calling capabilities over the Android device's file system. This allows in app or external providers to command Nabu or Glaive to list directories, read files, and manage external storage directly from the LLM context.

Experimental Codex OAuth

Nabu includes experimental support for connecting to Codex model family via OAuth.

You can authenticate with Codex directly from settings.
Once authenticated, Codex models will appear in the Remote tab of the model selector.
These remote models fully support the OpenCode and Open Interpreter API tooling workflows just like the local models.

Health

GET /health

Returns:

Model Listing

Endpoint paths for checking loaded/downloaded resources:

GET /models (Returns Nabu internal format)
GET /v1/models (Returns standard OpenAI model list JSON footprint)
GET /tts/models
GET /v1/tts/models

Query by type: ?type=llm|tts|all

LLM Generation

POST /generate (Nabu flat object payload)
POST /v1/chat/completions (OpenAI-compatible shape)

POST /v1/chat/completions expects messages and optionally tools:

{
  "model": "gemma3-1b-it-q4",
  "messages": [
    {"role": "user", "content": "What is the weather?"}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  }],
  "stream": true
}

Streaming /v1/chat/completions follows OpenAI-style SSE chunk events yielding delta.content strings, or delta.tool_calls JSON buffers, ending in data: [DONE].

TTS Generation

POST /tts/speech
POST /v1/audio/speech

Request fields:

input or text (required)
engine optional: kokoro, supertonic, soprano
model optional: e.g. soprano-80m-onnx, supertonic-3-onnx
voice/style optional
speed optional (default 1.0)
response_format optional: wav (default) or json

response_format: "wav" returns audio/wav bytes.
response_format: "json" returns base64-encoded WAV plus metadata.

Curl Examples

ADB Port Forwarding

To test the API locally from your host machine over USB/WiFi:

adb forward tcp:8455 tcp:8455

Health Check

curl http://127.0.0.1:8455/health

List Available LLMs (OpenAI Format)

curl "http://127.0.0.1:8455/v1/models?type=llm"

Generate TTS WAV to File

curl -s -X POST "http://127.0.0.1:8455/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{"input":"Welcome to Nabu on device AI","engine":"kokoro","response_format":"wav"}' \
  --output test_speech.wav

Simple OpenAI Chat Completion

curl -X POST "http://127.0.0.1:8455/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"gemma3-1b-it-q4",
    "messages":[{"role":"user","content":"Name three fast animals."}],
    "stream":false
  }'

Stream OpenAI Chat Completion

curl -N -X POST "http://127.0.0.1:8455/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"gemma3-1b-it-q4",
    "messages":[{"role":"user","content":"Say hello in five words."}],
    "stream":true
  }'

Send OpenCode Tool Call Request

curl -X POST "http://127.0.0.1:8455/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3-1b-it-q4",
    "messages": [
      { "role": "user", "content": "What is 55 times 12?" }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "multiply",
          "description": "Multiply two numbers",
          "parameters": {
            "type": "object",
            "properties": {
              "a": { "type": "number" },
              "b": { "type": "number" }
            }
          }
        }
      }
    ],
    "stream": false
  }'

Build

Open in Android Studio (Ladybug+ recommended), or use Gradle CLI.
Build:

./gradlew :app:assembleDebug

Install:

./gradlew :app:installDebug

Test

Unit tests:

./gradlew :app:testDebugUnitTest

Credits

Original Android base app: https://github.com/puff-dayo/Kokoro-82M-Android
Kokoro model: https://huggingface.co/hexgrad/Kokoro-82M
Kokoro ONNX conversion/runtime references: https://github.com/thewh1teagle/kokoro-onnx
Supertonic v3 model: https://huggingface.co/Supertone/supertonic-3
Supertonic v2 model: https://huggingface.co/Supertone/supertonic-2
Soprano original model/repo: https://github.com/ekwek1/soprano
Soprano ONNX web reference: https://github.com/KevinAHM/soprano-web-onnx
Soprano ONNX model packaging: https://huggingface.co/KevinAHM/soprano-onnx
Google AI Edge Gallery / MediaPipe LLM references: https://github.com/google-ai-edge/gallery
IPA transcribers: https://github.com/kotlinguistics/IPA-Transcribers
jsoup (EPUB/HTML parsing): https://jsoup.org/