Settings

Theme

I built a screen-free, storytelling toy with an ESP32

github.com

2 points by akadeb 10 hours ago · 1 comment

Reader

akadebOP 10 hours ago

I built an open-source, screen-free, storytelling toy for my nephew who uses a Yoto toy. My sister told me he talks to the stories sometimes and I thought it could be cool if he could actually talk to those characters in stories with AI models (STT, LLM, TTS) running locally on her Macbook and not send the conversation transcript to cloud models.

This is my voice AI stack:

- ESP32 on Arduino to interface with the Voice AI pipeline

- mlx-audio for STT (whisper) and TTS with streaming (`qwen3-tts` / `chatterbox-turbo`)

- mlx-vlm to use vision language models like Qwen3.5-9B and Mistral

- mlx-lm to use LLMs like Qwen3, Llama3.2, Gemma3

- Secure websockets to interface with a Macbook

This repo supports inference on Apple Silicon chips (M1/2/3/4/5) but I am planning to add Windows soon. Would love to hear your thoughts on the project.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection