Apple's New Speech APIs Outpace Whisper for Fast Transcription
macstories.netNot open source/weights, FU apple
unfortunately no actual measure of quality given in the article
>> By harnessing SpeechAnalyzer and SpeechTranscriber on-device, the command line tool tore through the 7GB video file a full 55% faster than MacWhisper’s Large V3 Turbo model, with no noticeable difference in transcription quality.
>> All three transcription workflows had similar trouble with last names and words like “AppStories,” which LLMs tend to separate into two words instead of camel casing.App Transcripiton Time -------------------------------- Yap (uses Apple APIs) 0:45 MacWhisper (Large V3 Turbo) 1:41 VidCap 1:55 MacWhisper (Large V2) 3:55Did you forget to paste the 3rd "Transcription Quality Metric" column?
No, because I pasted the "with no noticeable difference in transcription quality." part that you missed.
X=Y. What is the value of X?
It sounds like the quality is better than YouTube's.
> a game changer for anyone who uses voice transcription to create text from lectures, podcasts, YouTube videos, and more... generating transcripts that I upload to YouTube because the site’s built-in transcription isn’t very good.
and on par with Whisper.
> SpeechAnalyzer and SpeechTranscriber – available across the iPhone, iPad, Mac, and Vision Pro – mark a significant leap forward in transcription speed without compromising on quality
How much better?
All I know is that the most obvious room for improvement in yt transcription is that it will regularly get words wrong that are mentioned 50 times or may even be spelled out in the title or description of the video. If there are many thousands of clues the video is about [say] LLM's it shouldn't be necessary to endlessly fail to spell "chat GPT"
WER scores?