Nabu is an on-device test bench for TTS and chat:
- ONNX Runtime (
NNAPI/CPU) TTS with Kokoro-82M v1.0, Supertonic v1, Supertonic v2, and Soprano 1.1 (80M) via soprano-onnx - On-device LLM chat with LiteRT
.taskmodels (MediaPipe runtime) and experimental.ggufsupport viallama.cpp - E-reader and long-form playback workflows
Demo Video
Screenshots
Playground Workflows
TTS workbench: switch engines (kokoro,supertonic,soprano) and compare runtime behavior on-device.LLM workbench: run local chat models from managed LiteRT.taskdownloads or imported.gguffiles.Book workflow: open documents, edit text, save projects/bookmarks, and pre-generate per-line WAV audio for offline playback.Chat + TTS loop: generate responses with local LLMs and speak them through the active TTS engine.
TTS Engines Integrated
Kokoro
- Runtime: ONNX Runtime (
NNAPIwhen available, CPU fallback) - Credits chain:
- Original Kokoro model: https://huggingface.co/hexgrad/Kokoro-82M
- ONNX conversion/runtime reference: https://github.com/thewh1teagle/kokoro-onnx
- Original Android Kokoro app base: https://github.com/puff-dayo/Kokoro-82M-Android
Supertonic (v1 and v2 ONNX)
- Runtime: ONNX Runtime (CPU)
- Integrated model ids in app:
supertonic-onnx,supertonic-2-onnx - Credits chain:
- Original Supertonic project: https://github.com/supertonic-tts/supertonic
- Supertonic v1 ONNX packaging/distribution: https://huggingface.co/Supertone/supertonic
- Supertonic v2 ONNX packaging/distribution: https://huggingface.co/Supertone/supertonic-2
Soprano (80M ONNX)
- Runtime: ONNX Runtime (CPU)
- Integrated model id in app:
soprano-80m-onnx - Credits chain:
- Original Soprano repo and reference inference: https://github.com/ekwek1/soprano
- ONNX web reference implementation used for behavior parity debugging: https://github.com/KevinAHM/soprano-web-onnx
- ONNX packaging/distribution used by app downloader: https://huggingface.co/KevinAHM/soprano-onnx
Model Artifacts and Sources
Source manifests used by the app:
app/src/main/java/com/mewmix/nabu/kokoro/Manifest.ktapp/src/main/res/raw/model_allowlist.json
TTS Models
| Model | ID | Source |
|---|---|---|
| Kokoro v1.0 (FP16/INT8) | kokoro_fp16, kokoro_int8 |
ONNX fp16, INT8 release |
| Supertonic v1 | supertonic-onnx |
Hugging Face |
| Supertonic v2 | supertonic-2-onnx |
Hugging Face |
| Soprano 1.1 (ONNX pkg) | soprano-80m-onnx |
Original model, ONNX packaging |
LLM Models (.task)
| Model | ID | Source | Access |
|---|---|---|---|
| Gemma 3n IT 4B int4 | gemma-3n-E4B-it-int4 |
Hugging Face | gated |
| Gemma3 1B IT q4 | gemma3-1b-it-q4 |
Hugging Face | public |
| Gemma3 270M IT q8 | gemma3-270m-it-q8 |
Hugging Face | gated in allowlist |
| Qwen2.5 1.5B Instruct q8 | qwen2.5-1.5b-instruct-q8 |
Hugging Face | public |
Experimental GGUF Support
- Status: experimental local-import path for LLMs.
- Import flow: Models screen accepts LiteRT
.taskand.gguffiles via file picker. - Storage path: imported GGUF files are copied to
files/models/<model-id>.gguf. - Backend routing: imported
.ggufmodels are tagged as backendllamaand loaded throughLlamaCppBackend. - Current limits:
- No allowlist downloader for GGUF (manual import only).
- No remote size metadata/checksum flow for GGUF.
- TTS engines remain ONNX-based (
kokoro,supertonic,soprano); GGUF is not used for TTS inference.
Audiobook Workflow File Types
| Type | Format(s) | Used for |
|---|---|---|
| Book input | .epub (application/epub+zip) |
Full book/document ingestion |
| Book input | .pdf (application/pdf) |
Page text extraction and playback |
| Book input | .txt, text/* |
Plain text ingestion and playback |
| Edited book output | .epub |
Save edited copy from the in-app editor |
| Pre-generated audio cache | .wav |
Per-line cache in files/pregenerated/... |
| User audio export | .wav |
Saved audio clips to Android Music/ |
Unknown/other file types fall back to plain text extraction.
Persistence and Conversation Database
- Local DB:
kokoro.db(SQLite). - Chat conversations:
- Table:
conversations - Stores:
title,model_id, serializedmessagesJSON,created_at,updated_at
- Table:
- Audiobook/project state:
- Table:
projects(URI, project name, style mix, speed, bookmark line, pregen path, pregen toggle)
- Table:
- Table:
audio_lines(per-line cached WAV file path by document URI + line index) - Result: chat history, selected model linkage, project settings, bookmarks, and pre-generated line audio survive app restarts.
Local API Server
Nabu includes an opt-in local REST API server for on-device inference, exposing both text-to-speech and an OpenAI-compatible /v1/chat/completions endpoint.
- Default bind:
127.0.0.1:8455 - Optional LAN bind:
0.0.0.0:8455(enable in Settings) - Security note: there is no API auth layer yet; use LAN exposure only on trusted networks.
Enable it from Settings:
Enable API ServerExpose API on LAN(optional)
Agentic Tool Calling (OpenCode & Open Interpreter)
Nabu fully supports the OpenAI tools specification for agentic function calling over its local API. You can direct robust tooling environments like OpenCode and Open Interpreter to use Nabu as their LLM backend.
Nabu intercepts the system tool prompts, parses <tool_call> outputs efficiently, and maps them to standard JSON {"finish_reason": "tool_calls"} stream chunks.
Glaive File Manager & Local Tools
If you install the Glaive File Manager alongside Nabu, you can grant Nabu direct tool calling capabilities over the Android device's file system. This allows in app or external providers to command Nabu or Glaive to list directories, read files, and manage external storage directly from the LLM context.
Experimental Codex OAuth
Nabu includes experimental support for connecting to Codex model family via OAuth.
- You can authenticate with Codex directly from settings.
- Once authenticated, Codex models will appear in the
Remotetab of the model selector. - These remote models fully support the OpenCode and Open Interpreter API tooling workflows just like the local models.
Health
GET /health
Returns:
Model Listing
Endpoint paths for checking loaded/downloaded resources:
GET /models(Returns Nabu internal format)GET /v1/models(Returns standard OpenAI model list JSON footprint)GET /tts/modelsGET /v1/tts/models
Query by type: ?type=llm|tts|all
LLM Generation
POST /generate(Nabu flat object payload)POST /v1/chat/completions(OpenAI-compatible shape)
POST /v1/chat/completions expects messages and optionally tools:
{
"model": "gemma3-1b-it-q4",
"messages": [
{"role": "user", "content": "What is the weather?"}
],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
}],
"stream": true
}Streaming /v1/chat/completions follows OpenAI-style SSE chunk events yielding delta.content strings, or delta.tool_calls JSON buffers, ending in data: [DONE].
TTS Generation
POST /tts/speechPOST /v1/audio/speech
Request fields:
inputortext(required)engineoptional:kokoro,supertonic,sopranomodeloptional: e.g.soprano-80m-onnx,supertonic-onnx,supertonic-2-onnxvoice/styleoptionalspeedoptional (default1.0)response_formatoptional:wav(default) orjson
response_format: "wav" returns audio/wav bytes.
response_format: "json" returns base64-encoded WAV plus metadata.
Curl Examples
ADB Port Forwarding
To test the API locally from your host machine over USB/WiFi:
adb forward tcp:8455 tcp:8455
Health Check
curl http://127.0.0.1:8455/health
List Available LLMs (OpenAI Format)
curl "http://127.0.0.1:8455/v1/models?type=llm"Generate TTS WAV to File
curl -s -X POST "http://127.0.0.1:8455/v1/audio/speech" \ -H "Content-Type: application/json" \ -d '{"input":"Welcome to Nabu on device AI","engine":"kokoro","response_format":"wav"}' \ --output test_speech.wav
Simple OpenAI Chat Completion
curl -X POST "http://127.0.0.1:8455/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model":"gemma3-1b-it-q4", "messages":[{"role":"user","content":"Name three fast animals."}], "stream":false }'
Stream OpenAI Chat Completion
curl -N -X POST "http://127.0.0.1:8455/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model":"gemma3-1b-it-q4", "messages":[{"role":"user","content":"Say hello in five words."}], "stream":true }'
Send OpenCode Tool Call Request
curl -X POST "http://127.0.0.1:8455/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "gemma3-1b-it-q4", "messages": [ { "role": "user", "content": "What is 55 times 12?" } ], "tools": [ { "type": "function", "function": { "name": "multiply", "description": "Multiply two numbers", "parameters": { "type": "object", "properties": { "a": { "type": "number" }, "b": { "type": "number" } } } } } ], "stream": false }'
Build
- Open in Android Studio (Ladybug+ recommended), or use Gradle CLI.
- Build:
./gradlew :app:assembleDebug
- Install:
./gradlew :app:installDebug
Test
Unit tests:
./gradlew :app:testDebugUnitTest
Credits
- Original Android base app: https://github.com/puff-dayo/Kokoro-82M-Android
- Kokoro model: https://huggingface.co/hexgrad/Kokoro-82M
- Kokoro ONNX conversion/runtime references: https://github.com/thewh1teagle/kokoro-onnx
- Supertonic models: https://huggingface.co/Supertone/supertonic and https://huggingface.co/Supertone/supertonic-2
- Soprano original model/repo: https://github.com/ekwek1/soprano
- Soprano ONNX web reference: https://github.com/KevinAHM/soprano-web-onnx
- Soprano ONNX model packaging: https://huggingface.co/KevinAHM/soprano-onnx
- Google AI Edge Gallery / MediaPipe LLM references: https://github.com/google-ai-edge/gallery
- IPA transcribers: https://github.com/kotlinguistics/IPA-Transcribers
- jsoup (EPUB/HTML parsing): https://jsoup.org/



