Speaker diarization (labels) for OpenAI Whisper generated transcripts

44 points by ufarooqi 3 years ago · 5 comments

Reader

algon33 3 years ago

I tried using this for a technical talk[1], and it got the amount of speakers wrong. Which is somewhat suprising to me, as I would have thought diarization tech would just worked by now.

[1]https://www.youtube.com/watch?v=5lFxURxbyEc&list=PLiayR7yJx8...

ufarooqiOP 3 years ago

I'm gonna give it a try with your video. If I may ask how many speakers are there in this video. (I have to go through all of it otherwise). From what I can see, we have a teacher who is speaking most of the times and then few laughs from students in the background.
- algon33 3 years ago
  
  There are a couple of people interejecting with answers to questions, or asking questions. I'm afraid I don't have a better estiamte than that. But in this case, I think lumping the students together as one speaker and the teacher as another would be fine.

sandkoan 3 years ago

Woah! I've been facing the same problems with pyannote+whisper for diarization+transcription, and, coincidentally, was just experimenting with combining NeMO and whisper. Do you happen to have a repo for this? Would be invaluable.

Edit: Nevermind, found the link: https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831...

ufarooqiOP 3 years ago

I have attached a link to Google colab with the article.
https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831...

Settings

Speaker diarization (labels) for OpenAI Whisper generated transcripts

Keyboard Shortcuts