This github project includes PyTorch implementation for reproducing experiments and DNN models used in the paper Improved RawNet with Filter-wise Re-scaling for Text-independent Speaker Verification using Raw Waveforms which is submitted to Interspeech2020 as a conference paper. Pre-trained model is available at 'Pre-trained_model/rawnet2_best_weights.pt' and extracted speaker embeddings are available at spk_embd/.
For reproduction of the original RawNet paper, please refer to 'RawNet1' folder.
We used Nvidia GPU Cloud for conducting our experiments. We used the 'nvcr.io/nvidia/pytorch:19.10-py3' image. Refer to launch_ngc.sh. We used two Titan V GPUs for training.
-
Download VoxCeleb1&2 datasets and move to DB/.
(or just give directories to your DB as arguments using --DB DIR_TO_VOX1 and --DB_vox2 DIR_TO_VOX2)
Filetree will be added as reference in meantime. -
(selectively) Enter virtual environment using NGC.
-
Run train_RawNet2.py -name NAME
- Go into Pre-trained_model folder.
- Download extracted RawNet2 speaker embeddings for the VoxCeleb1 devset Here (Too big to upload in Github)
- Move downloaded speaker embedding to spk_embd/
- Run evaluate_pretrained_RawNet2.py
We encourage to use the extracted speaker embeddings for further speaker embedding enhancement studies or back-end studies since RawNet2 paper adopts simple cosine similarity for back-end classification.
Speaker embeddings are located under spk_embd/ and are saved using pickle, where it contains a dictionay.
Key : Utterance ID (Spk/videoID/segID)
Value : Speaker embedding
Email jeewon.leo.jung@gmail.com for other details :-).
This reposity provides the code for reproducing below papers.
@article{jung2020improved,
title={Improved RawNet with Filter-wise Rescaling for Text-independent Speaker Verification using Raw Waveforms},
author={Jung, Jee-weon and Kim, Seung-bin and Shim, Hye-jin and Kim, Ju-ho and Yu, Ha-Jin},
journal={arXiv preprint arXiv:2004.00526},
year={2020}
}
@article{jung2019RawNet,
title={RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification},
author={Jung, Jee-weon and Heo, Hee-soo and Kim, ju-ho and Shim, Hye-jin and Yu, Ha-jin},
journal={Proc. Interspeech 2019},
pages={1268--1272},
year={2019}
}
- Add comments to codes.
- Add filetree of Datasets
- 2020.04.01. : Initial commit
- 2020.04.02. : Evaluate Pre-trained Model validated
- 2020.04.02. : Evaluated training