Speech_diarization

ipynb file will be present in research_files folder The model requires GPU to generate output, please install CUDA and pytorch version compatible with your system

What is speaker diarization

Speaker diarization, also known as diarization, involves segregating an audio stream with human speech into consistent segments based on the individual identity of each speaker.

How to perform Speaker diarization?

Convert Audio to Text using Whisper
Segregate the text by clustering the embeddings using AgglomerativeClustering
Perform NER to recognize names of participants

Model Inputs

The model expects to inputs:

Audio for speaker diarization
Number of speakers in the audio

Model Output

Model will generate a complete transcript of the audio
Dictionary of diarization with participant's name

About

Speech Diarization for scrum automation

MIT License

Languages

Language:Jupyter Notebook 97.4%Language:Python 2.6%