Self-hosted Whisper-based voice recognition server for open Android phones
github.comI suspect something similar is possible with ChatGPT. Using the GPT-neo-125m model I've been able to get some really convincing (if lackluster) answers on 4 core ARM hardware and less than 2gb of memory. With enough sampling, you can get legible paragraph-length responses out in less than 10 seconds; that's pretty good for an offline program in my book.
I'm using rust-bert to serve it over a Discord bot, similar to one of their examples[0]. It's running on Oracle VCPUs right now, but with dedi hardware and ML acceleration I bet it would scream!
[0] https://github.com/guillaume-be/rust-bert/blob/master/exampl...
Yes, this could serve as the conduit from the Android phone voice input to a server-based ChatGPT (using the free Konele Android app as frontend).
I just clicked through and noticed the client-server part. I'd be curious to see if a smaller Whisper model could run on an Android phone too... All the same, nicely done!
As mentioned in the git README (at the bottom) there is at least one Whisper port that runs natively on Android. It does not run as fast on an Android phone as on iPhone (because of whisper.cpp optimizations for Apple silicon) but it still runs pretty well. In my tests, it does not run as fast as sending the raw audio across the network to a fast server for transcript there, which is what this post is about. But give it a try.
With this little bit of code you can use excellent voice recognition (ggerganov whisper.cpp port of Whisper) hosted on your own server, for your de-Googled Android phone, for text messaging, emails, search, and so on.