Settings

Theme

Self-hosted Whisper-based voice recognition server for open Android phones

github.com

15 points by nichohel 3 years ago · 5 comments

Reader

smoldesu 3 years ago

I suspect something similar is possible with ChatGPT. Using the GPT-neo-125m model I've been able to get some really convincing (if lackluster) answers on 4 core ARM hardware and less than 2gb of memory. With enough sampling, you can get legible paragraph-length responses out in less than 10 seconds; that's pretty good for an offline program in my book.

I'm using rust-bert to serve it over a Discord bot, similar to one of their examples[0]. It's running on Oracle VCPUs right now, but with dedi hardware and ML acceleration I bet it would scream!

[0] https://github.com/guillaume-be/rust-bert/blob/master/exampl...

  • nichohelOP 3 years ago

    Yes, this could serve as the conduit from the Android phone voice input to a server-based ChatGPT (using the free Konele Android app as frontend).

    • smoldesu 3 years ago

      I just clicked through and noticed the client-server part. I'd be curious to see if a smaller Whisper model could run on an Android phone too... All the same, nicely done!

      • nichohelOP 3 years ago

        As mentioned in the git README (at the bottom) there is at least one Whisper port that runs natively on Android. It does not run as fast on an Android phone as on iPhone (because of whisper.cpp optimizations for Apple silicon) but it still runs pretty well. In my tests, it does not run as fast as sending the raw audio across the network to a fast server for transcript there, which is what this post is about. But give it a try.

nichohelOP 3 years ago

With this little bit of code you can use excellent voice recognition (ggerganov whisper.cpp port of Whisper) hosted on your own server, for your de-Googled Android phone, for text messaging, emails, search, and so on.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection