Settings

Theme

200ms Voice LLM

github.com

93 points by shubhamgoel 2 years ago · 23 comments

Reader

unraveller 2 years ago

I hope this catches on and can be transplanted into any LLM soon. Instant voice integration is poised to unlock many UX things like auto-pilot in cars making you a real powerful co-pilot where you can suggest fine-tune settings on the fly for the current situation or anything that changes lots of settings at once with visual feedback.

  • ricardobeat 2 years ago

    Nio’s built-in assistant uses ChatGPT with Whisper (I think) and is almost real time already. It can change settings like that and rarely misses a beat.

    I expect once GPT 4o becomes available it will be awesome (even more if they “unlock” it to be asked generic questions and hold a conversation).

demarq 2 years ago

That was amazing and so productive. Looking at the transcript I was able to cover so much more ground than if I was stuck typing all my questions with a keyboard. I can imagine a future in offices where people have Ai rooms where you go not for a meeting with other people but to have a convo with ai.

Gys 2 years ago

Cannot wait for an instant translator. Something that can translate synchronously (!) one language it hears into a language that I understand. Getting closer!

  • ofrzeta 2 years ago

    I don't understand how that could possibly work. You will always need some time window, don't you? Also sometimes the translation of the first word can only be inferred when you process the last word in a sentence (simplified example).

    • mosselman 2 years ago

      There are people who do this and that is synchronous enough. So indeed maybe no Star Trek level mouth-moves-and-translation-comes-out, but with 200ms we are getting close to being able to speak perfectly fine with someone who’s language we don’t speak.

      • ofrzeta 2 years ago

        I've experienced human live translators and more often than not they paraphrase a sentence as a whole after it is uttered. I couldn't imagine otherwise because you can't start with translating until the sentence is finished by the original speaker. With human translators there's the additional challenge to translate/speak while listing.

        • mosselman 2 years ago

          Yes exactly and the EU has meetings this way, so it seems good enough already. I don't think in practice, let's say on holiday, you'd need anything faster than 200ms for the first token. Of course, more speed would probably mean better experience, but it is already well within the realm of fast enough. Accuracy matters a lot more.

    • Gys 2 years ago

      You mean the job of a translator is by definition impossible?

      Edit: Yes, for sure there will be a delay of a few words or even one or two short sentences. Just like human translators. Not a problem I think.

      Edit 2: Very curious why my first comment was downvoted?

      • ein0p 2 years ago

        You can’t translate word by word. You need entire phrases, sometimes pretty long ones, to understand the meaning.

        • woleium 2 years ago

          sometimes, but in a constrained situation, e.g. at a hotel reception desk, maybe not.

          • nmstoker 2 years ago

            So what would you do if translating from a language that puts the main verb at the end of a sentence, into a language that doesn't...

            Feels like there will be plenty of cases you can't just get around.

            • ein0p 2 years ago

              Not to mention that sometimes you need more than one sentence to get the full meaning

      • ofrzeta 2 years ago

        Obviously translation is possible but not "synchronously" as you wish. From what I understand 200ms is "time-to-first-token", so I still wonder how that works because, as I wrote in my comment above, typically there is no one-to-one correspondence of words/tokens from one language to another. (I didn't downvote your comment)

akreal 2 years ago

Cool!

HF Transformers is great for prototyping and research, but should not an interactive tool like this be based on something more speed-focused, like llama.cpp?

Any plans for languages beyond English?

  • juberti 2 years ago

    We're running it on vLLM and are working with others in the community to bring it to other optimized inference frameworks.

morjom 2 years ago

In the same vein is there any good, low latency, speech to text to speech (STTTS?) capable programs that are making use of LLMs or AI?

mungoman2 2 years ago

Yeah this is awesome. Keep reducing that latency, that's the path to the killer assistant.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection