Moshi for Mere Mortals

By TongKe Xue* Rohit Swamy Charles Niu | June 16, 2026

* Primary contributor

In 2024, Kyutai released Moshi, a full-duplex voice-to-voice model with 200 ms practical latency. This note documents the diagrams we drew while studying Moshi and Mimi, with pseudocode and equations for readers who want the extra detail.

Follow @lognprg Follow @bicro_ Follow @coatol5