alibaba-damo-academy / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool



license

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization. All pretrained models are accessible on ModelScope. Furthermore, we present a large-scale speech corpus also called 3D-Speaker to facilitate the research of speech representation disentanglement.

Quickstart

Install 3D-Speaker

git clone https://github.com/alibaba-damo-academy/3D-Speaker.git && cd 3D-Speaker
conda create -n 3D-Speaker python=3.8
conda activate 3D-Speaker
pip install -r requirements.txt

Running experiments

# Speaker verification: ERes2Net on 3D-Speaker dataset
cd egs/3dspeaker/sv-eres2net/
bash run.sh
# Speaker verification: CAM++ on 3D-Speaker dataset
cd egs/3dspeaker/sv-cam++/
bash run.sh
# Speaker verification: ECAPA-TDNN on 3D-Speaker dataset
cd egs/3dspeaker/sv-ecapa/
bash run.sh
# Speaker verification: ResNet on 3D-Speaker dataset
cd egs/3dspeaker/sv-resnet/
bash run.sh
# Speaker verification: Res2Net on 3D-Speaker dataset
cd egs/3dspeaker/sv-res2net/
bash run.sh
# Self-supervised speaker verification: RDINO on 3D-Speaker dataset
cd egs/3dspeaker/sv-rdino/
bash run.sh
# Audio and multimodal Speaker diarization:
cd egs/3dspeaker/speaker-diarization/
bash run_audio.sh
bash run_video.sh
# Language identification
cd egs/3dspeaker/language-idenitfication
bash run.sh

Inference using pretrained models from Modelscope

All pretrained models are released on Modelscope.

# Install modelscope
pip install modelscope
# CAM++ trained on VoxCeleb
model_id=iic/speech_campplus_sv_en_voxceleb_16k
# CAM++ trained on 200k labeled speakers
model_id=iic/speech_campplus_sv_zh-cn_16k-common
# ERes2Net trained on VoxCeleb
model_id=iic/speech_eres2net_sv_en_voxceleb_16k
# ERes2Net trained on 200k labeled speakers
model_id=iic/speech_eres2net_sv_zh-cn_16k-common
# Run CAM++ or ERes2Net inference
python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path

# RDINO trained on VoxCeleb
model_id=iic/speech_rdino_ecapa_tdnn_sv_en_voxceleb_16k
# Run rdino inference
python speakerlab/bin/infer_sv_rdino.py --model_id $model_id --wavs $wav_path

Overview of Content

What‘s new 🔥

To be expected 🔥

  • [2024.5] Releasing asnormalization in speaker verification.
  • [2024.5] Releasing multi-modal (semantic module) training recipes.

Contact

If you have any comment or question about 3D-Speaker, please contact us by

  • email: {chenyafeng.cyf, zsq174630, tongmu.wh, shuli.cly}@alibaba-inc.com

License

3D-Speaker is released under the Apache License 2.0.

Acknowledge

3D-Speaker contains third-party components and code modified from some open-source repos, including:
Speechbrain, Wespeaker, D-TDNN, DINO, Vicreg, TalkNet-ASD , Ultra-Light-Fast-Generic-Face-Detector-1MB

Citations

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

@article{chen20243d,
  title={3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},
  journal={arXiv preprint arXiv:2403.19971},
  year={2024}
}
@inproceedings{zheng20233d,
  title={3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement},
  author={Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang and Qian Chen},
  url={https://arxiv.org/pdf/2306.15354.pdf},
  year={2023}
}
@inproceedings{wang2023cam++,
  title={CAM++: A Fast and Efficient Network For Speaker Verification Using Context-Aware Masking},
  author={Wang, Hui and Zheng, Siqi and Chen, Yafeng and Cheng, Luyao and Chen, Qian},
  year={2023},
  booktitle={INTERSPEECH}
}
@inproceedings{chen2023enhanced,
  title={An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and Chen, Qian and Qi, Jiajun},
  year={2023},
  booktitle={INTERSPEECH}
}
@inproceedings{chen2023pushing,
  title={Pushing the limits of self-supervised speaker verification using regularized distillation framework},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and Chen, Qian},
  booktitle={ICASSP 2023},
  year={2023}
}

About

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

License:Apache License 2.0


Languages

Language:Python 52.2%Language:Shell 28.2%Language:Perl 15.7%Language:C++ 3.7%Language:CMake 0.3%