Thanks to a web demo of a new AI tool called Koe Recast, you can transform up to 20 seconds of your voice into different styles, including an anime character, a deep male narrator, an ASMR whisper, and more. It’s an eye-opening preview of a potential commercial product currently undergoing private alpha testing.
Koe Recast emerged recently from a Texas-based developer named Asara Near, who is working independently to develop a desktop app with the aim of allowing people to change their voices in real time through other apps like Zoom and Discord. “My goal is to help people express themselves in any way that makes them happier,” said Near in a brief interview with Ars.
Several demos on the Koe website show altered clips of Mark Zuckerberg talking about augmented reality with a female voice, a deep male narrator voice, and a high-pitched anime voice, all powered by Recast.
This kind of realistic AI-powered voice transformation technology isn’t new. Google made waves with similar tech in 2018, and audio deepfakes of celebrities have caused controversy for several years now. But seeing this capability in an independent startup funded by one person—”I’ve funded this project entirely by myself thus far,” Near said—shows how far AI vocal synthesis tech has come and perhaps hints at how close voice transformation might be to widespread adoption through a low-cost or open source release.
When asked what specific kind of AI powers Recast’s voice transformation under the hood, Near held back specifics but generalized how it works, “We’re able to dive in and alter the characteristics of voices within the embedding space that we’ve created. Our goal, then, is to modify the parts of audio that correspond to a speaker’s personal style or timbre while preserving the parts of the audio that correspond to the spoken content such as prosody and words. This allows us to change the style of someone’s voice to any other style, including their perceived gender, age, ethnicity, and so on.”