Deep Dive into FFmpeg 8.0
rendi.devAuthor here, available for questions
Whisper will hallucinate on audio segments that don't have any speech. VAD mitigates that. Expect worse results without it, especially on non-English audio.
This is great I’ll have to give it a shot
"Lenovo laptop with Nvidia RTX 4040" 4060?
Correct. I fixed the typo
Is the point that you only need one tool -- ffmpeg -- to both generate transcripts as well as embed those into a video as opposed to having multiple tools?
This is a 3 part series, the first one discusses the new native whisper integration. And correct, for the first post - the point is to show that you can only use ffmpeg to transcribe and embed subtitles in a video
While there's appeal in having one tool do several things I'm more a fan of the traditional UNIX philosophy that a tool should do one thing, do it extremely well, and allow for chaining of several tools together to achieve a multi-step process.
I tend to agree. The thing I like most about version 8 is actually pad_cuda - nice performance boost for resizing video with an Nvidia GPU
Do you know if it’s supported on Mac too, with whatever platform specific optimizations like running it on the gpu / with MPS ?
You mean Vulkan? In the blog post there is reference to all vulkan supported platforms
If you mean ffmpeg build with whisper - from memory I didn't see ffmpeg-builds for mac, so you will probably need to compile yourself