GitHub - bilalbayram/mdbook-tts: turn an mdBook into a listenable book

6 min read Original article ↗

mdbook-tts turns an mdBook into a listenable book. It injects a floating chapter player into every HTML page and can either stream pre-generated provider audio or fall back to the browser's built-in speech synthesis.

mdbook-tts demo with floating chapter player

What it does

Build-time chapter audio - Generate MP3 chunks during the build, publish manifests beside the rendered book, and let the player stream those chunks chapter-by-chapter.

Provider-agnostic pipeline - Kokoro, OpenAI, Grok, and ElevenLabs all plug into the same cache and publishing flow. Switching providers is a config change, not a code change.

Browser API fallback - If a chapter has no generated manifest, the player falls back to window.speechSynthesis and uses the voices available in the current browser and OS.

Stable chunk caching - Audio is cached per chunk x provider x model x voice, so a typo fix in one paragraph only invalidates the affected chunk.

Drop-in integration - CSS and JS are injected by the preprocessor. There is no extra theme asset installation step.

Installation

Install the binary:

If you are working from this repo directly, building locally is enough:

For a repo-local binary, point command in book.toml at the built executable, for example:

[preprocessor.tts]
command = "/absolute/path/to/mdbook-tts/target/debug/mdbook-tts"
renderers = ["html"]
max-voices = 4
default-rate = 1.0

If the binary is installed in PATH, use:

[preprocessor.tts]
command = "mdbook-tts"
renderers = ["html"]
max-voices = 4
default-rate = 1.0

Quick Start

1. Add the shared audio config

Put this in your book's book.toml:

[preprocessor.tts]
command = "mdbook-tts"
renderers = ["html"]
max-voices = 4
default-rate = 1.0

[preprocessor.tts.audio]
enabled = true
provider = "openai"
cache-dir = ".mdbook-tts/cache"
public-dir = "tts"
format = "mp3"
parallelism = 4
partial-publish = true

2. Set provider prerequisites

Remote providers need an API key in your shell or .env. Local Kokoro builds need espeak-ng, and they need ffmpeg when audio.format = "mp3". The binary loads .env from the current working directory and then walks parent directories until it finds one. Existing shell environment variables win over .env.

3. Add one provider section

Only one provider is active per build. Pick one of the sections below and add it to book.toml.

4. Build with provider audio

Run the wrapper build:

If you are using the local repo binary:

This is the important part: mdbook-tts build . generates provider audio first and then runs mdbook build.

Plain mdbook build still works, but it does not generate provider MP3 files. It only gives you the injected player plus browser speech fallback.

Provider Usage

Kokoro

Prerequisites:

# macOS
brew install espeak-ng ffmpeg

# Linux
# install espeak-ng and ffmpeg with your package manager

mdbook-tts manages the Kokoro Python runtime for you. It prefers uv, falls back to python3, and then to python. The first run downloads the configured Hugging Face repo into hf-cache-dir; later runs reuse the cached model files.

book.toml:

[preprocessor.tts.audio]
enabled = true
provider = "kokoro"
format = "mp3"

[preprocessor.tts.providers.kokoro]
repo-id = "hexgrad/Kokoro-82M"
voices = ["af_heart", "af_sky"]
lang-code = "a"
hf-cache-dir = ".mdbook-tts/hf-cache"
runtime-dir = ".mdbook-tts/kokoro"
compression = "ffmpeg-mp3"
ffmpeg-bin = "ffmpeg"
uv-bin = "uv"
python-bin = "python3"
espeak-ng-bin = "espeak-ng"

Notes:

  • audio.format = "mp3" plus compression = "ffmpeg-mp3" is the default local path.
  • To opt out of MP3 compression, set audio.format = "wav" and compression = "none".
  • revision is optional and pins the Hugging Face repo revision when set.
  • If python-bin is missing, mdbook-tts falls back to python.
  • Missing espeak-ng or ffmpeg fails the build with install instructions instead of trying to install system packages for you.

Build:

Kokoro sample:

mdbook-tts speak --provider kokoro --text "Testing Kokoro from mdbook tts."

OpenAI

Environment variable:

export OPENAI_API_KEY=your_key_here

book.toml:

[preprocessor.tts.audio]
enabled = true
provider = "openai"

[preprocessor.tts.providers.openai]
model = "gpt-4o-mini-tts-2025-12-15"
voices = ["alloy", "nova", "echo", "shimmer"]

Use this when you want the simplest hosted setup in this repo. The OpenAI path is the one currently exercised by the direct smoke test command:

mdbook-tts speak --text "Testing OpenAI text to speech from mdbook-tts."

That command writes target/mdbook-tts/openai-sample.mp3. It is a one-shot OpenAI smoke test and is separate from the full book build.

Grok

Environment variable:

export XAI_API_KEY=your_key_here

book.toml:

[preprocessor.tts.audio]
enabled = true
provider = "grok"

[preprocessor.tts.providers.grok]
model = "<grok-tts-model>"
voices = ["<voice-id-1>", "<voice-id-2>"]

Notes:

  • voices are sent as Grok voice_id values.
  • model should be the xAI TTS model you want to use for the build.

Build:

ElevenLabs

Environment variable:

export ELEVENLABS_API_KEY=your_key_here

book.toml:

[preprocessor.tts.audio]
enabled = true
provider = "elevenlabs"

[preprocessor.tts.providers.elevenlabs]
model = "<elevenlabs-model-id>"
voices = ["<voice-id-1>", "<voice-id-2>"]

Notes:

  • model is sent as ElevenLabs model_id.
  • voices should be ElevenLabs voice IDs, not display names. The adapter uses each voice directly in the /v1/text-to-speech/{voice} request path.

Build:

Browser API Fallback

If you want a zero-provider setup and are okay with browser and OS voices, disable build-time audio:

[preprocessor.tts.audio]
enabled = false

Then run:

In this mode:

  • no provider MP3 files are generated
  • no chapter manifests are published
  • the same injected player still appears
  • playback uses window.speechSynthesis
  • available voices depend on the user's browser and operating system

The player also falls back to browser speech automatically when a chapter does not have a published audio manifest.

Output Layout

Build-time audio uses three path groups:

  • Cache: .mdbook-tts/cache/<provider>/<voice>/<hash>.<extension>
  • Published audio: book/tts/audio/<provider>/<voice>/<hash>.<extension>
  • Published manifests: book/tts/manifests/<chapter-id>.json

This keeps the cache reusable across builds while only copying referenced assets into the final book output.

Demo

This repo includes a demo book in demo/.

Build it with provider audio:

Open the rendered page in:

  • demo/book/chapter_1.html
  • demo/book/tts/manifests/
  • demo/book/tts/audio/

Notes

  • max-voices limits how many configured voices are exposed to the UI and built into manifests.
  • partial-publish = true allows the build to continue even if some voices or chapters fail.
  • partial-publish = false makes incomplete provider generation fail the build.
  • There is no word-level highlighting yet.
  • Chunk playback is streamed as a playlist, not merged into one chapter-length file.
  • Inline player assets are injected per page by the preprocessor.

Development

Run tests:

Try the OpenAI smoke test:

cargo run -- speak --text "Testing OpenAI text to speech from mdbook-tts."

Try the built-in Kokoro smoke test:

cargo run -- speak --provider kokoro --text "Testing Kokoro from mdbook tts."

That command writes target/mdbook-tts/kokoro-sample.mp3 by default.

For a lower-level runtime check, the repo also keeps a standalone script:

python scripts/kokoro_smoke_test.py