Diff-SV

Pytorch code for following paper:

Title : Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models (Accepted for ICASSP 2024, available here)
Autor : Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim and Ha-Jin Yu

Abstract

Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems.

Prerequisites

Environment Setting

We used 'nvcr.io/nvidia/pytorch:21.04-py3' image of Nvidia GPU Cloud for conducting our experiments.
Run 'build.sh' file to make docker image

./docker/build.sh

Run 'interactive.sh' file to activate docker container
Note that you must modify the mapping path before running the 'interactive.sh' file

./docker/interactive.sh

Datasets

We used VoxCeleb1 dataset for training and test.
For noisy test, we used the MUSAN, Nonspeech100, and VOiCES datasets.
Each downloaded dataset should be mapped to the 'data' folder in docker environment.

Train and test

python3 code/diff_sv/main.py

Citation

Please cite this paper if you make use of the code.

@article{kim2023diff,
  title={Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models},
  author={Kim, Ju-ho and Heo, Jungwoo and Shin, Hyun-seo and Lim, Chan-yeong and Yu, Ha-Jin},
  journal={arXiv preprint arXiv:2309.08320},
  year={2023}
}

About

Pytorch implementation of Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

MIT License

Languages

Language:Python 99.2%Language:Dockerfile 0.5%Language:Shell 0.3%