.
|___data
│ |___train
|___speaker1
|___audio1.wav
|___ ....
|___audion.wav
|___ ....
|___speakern
|___audio1.wav
|___ ....
|___audion.wav
│ ├── val
│ └── test
NOTE: The original repo has something wrong when splitting data, you should put all data on train folder (and a small part on val and test)
-
Download pretrained model at param.model
-
To finetune, run:
python3 main.py --scenario train --load_pretrained
- To train, run:
python3 main.py --scenario train
- To test with your dataset, run:
python3 main.py --scenario test_folder
- To test cosin similarity of two files (you should define your threshold for how similar of two files is considered spoken by the same person. I usually recommend in range 0.75 - 0.9):
python3 main.py --scenario test_two_files \
--filetest_1 path/to/file_1 \
--filetest_2 path/to/file_2 \
Original ECAPA-TDNN paper
@inproceedings{desplanques2020ecapa,
title={{ECAPA-TDNN: Emphasized Channel Attention, propagation and aggregation in TDNN based speaker verification}},
author={Desplanques, Brecht and Thienpondt, Jenthe and Demuynck, Kris},
booktitle={Interspeech 2020},
pages={3830--3834},
year={2020}
}
We study many useful projects in our codeing process, which includes:
Thanks for these authors to open source their code!