Settings

Theme

Grafting a Speech Head onto Gemma 4 E4B

frisson-labs.com

3 points by ymaws 2 months ago · 1 comment

Reader

ymawsOP 2 months ago

I wanted to get my hands dirty with the Gemma model and try out some model surgery. This is a small smoke test, not a production voice model, but the wiring was fun enough to write up.

Gemma can take in audio, images, and text, but only talks back in text. Mimi can turn codec tokens back into speech. So I froze both sides and trained a small graft in the middle: Gemma hidden states -> Mimi audio tokens.

I've enjoyed playing with this because the bad audio outputs have sounded hilarious

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection