Speech-to-video synthesis: Real-time rendering of speech
Hi guys,
After some research and no luck finding anyone that seems to be working on this, I thought I'd try a Hail Mary and post on here.
I'm looking to speak to anyone who is working on speech-to-video (real-time speech rendering). We already have software which can take audio (speech) input and render a video which resembles a person or avatar speaking, but it takes a long time to render.
How long will it be before the video of the person/avatar speaking will be renderable in near real-time, with similar latency to existing speech-to-text models?
What would the prototype look like to reduce the latency? Is anyone working on anything like this?
For context, I run a language learning app where you can practice speaking orally with AI. It would be far more engaging if the user had an avatar/person to be able to speak to, rather than staring at the chat history whilst talking to the AI conversation partner.
Thanks, Chris
For context, here's the original post: https://news.ycombinator.com/item?id=36973400 this ? https://www.heygen.com/article/unleashing-the-power-of-realt... https://docs.trypromptly.com/guides/realtime-avatar-with-rag Wow. Yes, thank you so much! I knew of HeyGen, but had no idea they'd done this with real-time avatars.