Ask HN: What do you use for speaker diarization?
Hi,
I am looking for a fire and forget solution akin to whisper where I can give it a wav of around 12 people and it can give me a diarization on the format (speaker_1, speaker_2, etc)
whispercpp gives labels like speaker_turn which is not what I am looking for, I need to know who said what
nvidia nemo only works with 4 speakers and unfortunately is not good enough for me
Do you have an open source solution that you can suggest? Or a potential pipeline?
Much appreciated! WhisperX with pyannote, but it is not perfect, sometime for the same speaker you will get multiple labels. There is no open source fire and forget solution as far as I know.