Settings

Theme

Fast inference of OpenAI's Whisper on Rockchip processors

github.com

2 points by keveman 2 years ago · 5 comments

Reader

smpanaro 2 years ago

What's an example use case for something like this? "At the edge" makes me think offline but are you generating audio at anything faster than real time in that case?

Would be curious to see an even lower cost/lower power option. Seems this one is $120-170.

  • kevemanOP 2 years ago

    This is for speech to text, so generating text, not audio. And on a $120-$170 device, this transcribes at 30x real time. The code does run on lower end Rockchip processors, costing ~$30, although only at 10x real time speed.

    • smpanaro 2 years ago

      Sorry for the confusing phrasing about STT vs TTS. I'm not familiar with cases where you would use something like this 'at the edge' instead of say a laptop. I was thinking maybe some sort of offline setup with a microphone -- but in that case the audio is just real-time. Do you have some use cases in mind?

      1/4 of the price for 1/3 of the speed is a good deal! Presumably still faster than faster-whisper on the same hardware?

      • kevemanOP 2 years ago

        This enables a true natural language voice interface to any device/appliance that currently has a touch pad or a bunch of buttons. Yes, faster than faster-whisper, but that's really an apples to oranges comparison, because useful-transformers uses the NPU on the Rockchip processors, so it works only on those. Whereas faster-whisper works fast on most/all platforms.

kevemanOP 2 years ago

The tiny.en Whisper model transcribes speech at 30x real-time speeds on an Orange Pi 5.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection