This toolkit is based on the LIUM Speaker Diarization Toolkit. It can be used for performing unsupervised speaker clustering in sound files based on the diarization output, where speakers with similar voice characteristics will to the same cluster. This can be useful to perform speaker adaptive training of acoustic models for speech recognition when the identity of the speakers is unknown.
The script additionally trains a Gaussian Mixture Model for speaker recognition, and classifies speakers in the test files with regard to the speaker clusters.
- Requirements for running the scripts
- Java JDK (1.6+)
- bash
- sox
- Python 2.7
- LIUM SpeakerDiarization Toolkit
-
Using the toolkit WAV files for training and testing should be placed in the folders
wav/
andtest/
, respectively. Executingrun.sh
performs the whole process described above. -
List of files
src/
- LIUM Toolkit and pretrained modelswav/
- put training files heretest/
- put test files hereassign_clusters.py
(1.0 kB) - assign global clusters to individual filescluster_individual.sh
(5.8 kB) - cluster individual files (based on the LIUM Wiki)cluster_init.sh
(2.0 kB) - initialize clusters (based on the LIUM Wiki)concat_seg.py
(2.0 kB) - concatenates segmentation files with necessary offsetget_clust.py
(726 B) - get individual cluster IDs based on cluster filesrun.sh
(4.1 kB) - run the full clustering, training and testingsegment_egs.py
(1.7 kB) - get samples for each clustertrain_speaker.sh
(1.0 kB) - train speaker GMM with MAP
-
List of output folders and files
data/
- clustering data files produced by LIUMsample/
- samples of the original sound files corresponding to clusters$ID/
- data folders corresponding to samples, their content is similar to that of thedata/
foldersamples.seg
- concatenated and modified sample cluster IDs (modified to avoid overlap)cross.seg
- result of global re-clustering