autiobook
convert epub files to audiobooks using qwen3-tts.
requirements
- python 3.12+
- ffmpeg
- sox
- uv (python package manager)
- gpu recommended (cuda or rocm)
installation
# cuda gpu (default) make build-cuda # amd rocm gpu (gfx1151) make build-rocm # cpu only make build-cpu
usage
enter the venv
source .venv/bin/activate
autiobook --helplist chapters
autiobook chapters book.epub
full conversion (idempotent)
autiobook convert book.epub -o workdir/
runs all phases, skipping already-completed steps.
extract
extract chapter text from epub to workdir.
autiobook extract book.epub -o workdir/
creates:
workdir/extract/metadata.json- book metadataworkdir/extract/NN_Title.txt- chapter text filesworkdir/extract/state.json- resumability state
synthesize
convert text files to wav audio.
autiobook synthesize workdir/ -s Ryan
creates:
workdir/synthesize/NN_Title.wav- audio filesworkdir/synthesize/state.json- resumability state
export
convert wav files to mp3 with metadata.
autiobook export workdir/
creates:
workdir/export/NN_Title.mp3- mp3 files with id3 tagsworkdir/export/state.json- resumability state
dramatized conversion (llm)
generate a full cast performance using openai-compatible llm (including llama.cpp) and voice cloning.
# 1. extract text autiobook extract book.epub -o workdir/ # 2. generate cast list (using llm) autiobook cast workdir/ --api-key sk-... # 3. generate voice auditions (review/edit characters.json first if needed) autiobook audition workdir/ # 4. create dramatized script (using llm) autiobook script workdir/ --api-key sk-... # 5. validate script against source (optional) autiobook validate workdir/ # 6. fix any issues found (optional) autiobook fix workdir/ --api-key sk-... # 7. perform the script (voice cloning) autiobook perform workdir/ # 8. export to mp3 autiobook export workdir/
or run the full dramatization pipeline in one go:
autiobook dramatize workdir/ --api-key sk-...
script validation and repair
after generating scripts, you can validate that all source text is covered and detect any hallucinated content:
# check for both missing text and hallucinated segments autiobook validate workdir/ # check only for missing text autiobook validate workdir/ --missing # check only for hallucinated segments autiobook validate workdir/ --hallucinated
to fix issues found during validation:
# fill missing text and remove hallucinated segments autiobook fix workdir/ --api-key sk-... # only fill missing text (uses LLM with surrounding context) autiobook fix workdir/ --missing --api-key sk-... # only remove hallucinated segments (no LLM needed) autiobook fix workdir/ --hallucinated # control context amount for LLM (characters or paragraphs) autiobook fix workdir/ --missing --context-chars 1000 --api-key sk-... autiobook fix workdir/ --missing --context-paragraphs 3 --api-key sk-...
options
-o, --output DIR- output directory-s, --speaker NAME- tts voice (default: Ryan)-c, --chapters RANGE- chapter selection (e.g., 1-5, 3,7,10)-v, --verbose- verbose output
available voices
Vivian, Ryan, Sunny, Aria, Bella, Nova, Echo, Finn, Atlas
output
creates one mp3 file per chapter in workdir/export/:
workdir/export/
├── 01_Introduction.mp3
├── 02_Chapter_One.mp3
└── ...
compatible with the Voice audiobook player for android.
workdir structure
Intermediate files are organized into subdirectories by command:
workdir/
├── extract/ # extracted text and metadata
│ ├── metadata.json
│ ├── cover.jpg
│ ├── NN_Title.txt
│ └── state.json
├── cast/ # character list and analysis state
│ ├── characters.json
│ └── state.json
├── audition/ # character voice samples
│ ├── Character.wav
│ └── state.json
├── script/ # dramatized scripts (speaker segments)
│ ├── NN_Title.json
│ └── state.json
├── perform/ # dramatized audio performance
│ ├── NN_Title.wav
│ ├── segments/ # segment cache
│ └── state.json
├── synthesize/ # standard mono-voice audio
│ ├── NN_Title.wav
│ ├── segments/ # segment cache
│ └── state.json
└── export/ # final mp3 output
├── NN_Title.mp3
└── state.json
Each command is fully resumable based on content hashes stored in state.json.