DungNasSa10 / Deep-Learning-Project-2022-1-Vietnamese-Speaker-Verification

Vietnamese Speaker Verification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep-Learning-Project-2022-1-Vietnamese-Speaker-Verification

Installation

  • Python version == 3.8
conda create -n dlds python=3.8 -y
conda activate dlds
pip install -r requirements.txt
  • You also need to install FFmpeg. It can be downloaded through this website or you can install it on Linux with apt with the following command:
$ sudo apt update
$ sudo apt install ffmpeg

Data and Output Folder

data
|
|───metadata
|   |
│   │───crawl
|   |       Voice list - An.csv
|   |       Voice list - Dung.csv
|   |       ...
|   |    
│   │───test
|   |   |
|   |   |───labels
|   |   |       private_t1_test_pairs_with_label.txt
|   |   |       private_t2_test_pairs_with_label.txt
|   |   |       public_test_pairs_with_label.txt
|   |   |
|   |   |───test_pairs
|   |           private_t1_test_pairs.txt 
|   |           private_t2_test_pairs.txt
│   │           public_test_pairs.txt
|   |   
│   |───train
|          training_metadata.txt
|
|───musan_augment
|
|───rir_noises
|
|───test
|   |
|   |───sv_vlsp_2021
|       |
|       |───private_test
|       |   |
|       |   |───competition_private_test
|       |
|       |───public_test
|           |
|           |───competition_public_test
|   
|───train
|   |
|   |───speaker_0
|   |   |
|   |   |───video_0
|   |   |       0.wav
|   |   |       1.wav
|   |   |       ...
|   |   |
|   |   |───video_1
|   |   |       0.wav
|   |   |       1.wav
|   |           ...
|   |
|   |───speaker_1
|   |   |
|   |   ...
|
|
output
|
|───checkpoints
|       ECAPA_CNN_TDNN.model
|       ECAPA_TDNN.model
|       RawNet3.model
|       SEResNet34.model
|       VGG_M_40.model

Datasets

  • You can find our datasets in Kaggle through this link vietnamese-sv-datasets. Note that in the this link, the structure of the dataset is a little different from the above folder structure but you don't need to worry about it. Only thing you need to do is change the file path properly or simply, just run our notebooks that we have provided for you.

Crawling process

  • We have prepared some csv files used for crawling in folder data/metadata/crawl. You can create new files by make a copy of the template in this link crawling-template, download
    in csv form and put it in the folder data/metadata/crawl.
  • The following script will download all the videos in the csv file data/metadata/crawl/Voice list - An.csv, convert them to .wav file, resample them into 16000 sample rate, use Silero VAD model to collect speech chunks, remove noisy audios and save the results in the folder data/train. The argument -f refer to the path of the csv file, you can also put the link to the files saved in Google Drive while the argument -d specify the folder in which crawled data will be stored in.
python src/crawl.py -f https://drive.google.com/file/d/1BL4tkLkPPDeuYwlCaMvIrKYCst8E4zEN/view?usp=share_link -d ./data/train -sr 16000

Training

We have prepared some config files for you. You can change the arguments in configuration file or pass individual arguments that are defined in trainSpeakerNet.py by --{ARG_NAME} {VALUE}. Note that the configuration file overrides the arguments passed via command line.

  • Train RawNet3 with AAM-Softmax loss (the model checkpoints and validation results will be stored in folder output/RawNet3_AAM/)
python src/learn.py --config src/learning/configs/VGG_M_40_AAM.yaml --train
  • Train SE-ResNet34 with AAM-Softmax loss
python src/learn.py --config src/learning/configs/SEResNet34_AAM.yaml --train
  • Train ECAPA-TDNN with Angular Prototypical loss
python src/learn.py --config src/learning/configs/ECAPA_TDNN_AP.yaml --train
  • If you want to train on Kaggle, make a copy and run the Training part in this notebook Vietnamese_SV

Evaluation

  • This command will evaluate the trained model SEResNet34 with Angular Prototypical loss on Public test and output the EER(%).
python src/learn.py --eval --config src/learning/configs/SEResNet34_AP.yaml --initial_model path/to/model/checkpoints 
  • This command will evaluate the trained model ECAPA CNN-TDNN with AAM-Softmax loss on T01 Private test.
python src/learn.py --eval --config src/learning/configs/ECAPA_CNN_TDNN_AAN.yaml --initial_model path/to/model/checkpoints
  • This command will evalate the trained model RawNet3 with AAM-Softmax loss on T02 Private test.
python src/learn.py --eval --config src/learning/configs/RawNet3_AAN.yaml --initial_model path/to/model/checkpoints
  • Again, you can change the path of eval file in the config file. These eval files are saved in folder data/metadata/test/labels
  • If you want to train on Kaggle, make a copy and run the Evaluation part in this notebook Vietnamese_SV

Testing

  • This command will test the trained model SEResNet34 with Angular Prototypical loss on Public test. The output is a csv file of the form audio_1 audio_2 similarity_score and be stored in output/testing_results/public_test
python src/learn.py --test --config src/learning/configs/VGG_M_40_AP.yaml --initial_model path/to/model/checkpoints
  • This command will eval the trained model ECAPA CNN-TDNN with AAM-Softmax loss on T01 Private test.
python src/learn.py --test --config src/learning/configs/ECAPA_CNN_TDNN_AAN.yaml --initial_model path/to/model/checkpoints
  • This command will eval the trained model RawNet3 with AAM-Softmax loss on T02 Private test.
python src/learn.py --test --config src/learning/configs/RawNet3_AAN.yaml --initial_model path/to/model/checkpoints
  • Again, you can change the path of test file in the config file. These test files are saved in folder data/metadata/test/test_pairs
  • If you want to train on Kaggle, make a copy and run the Testing part in this notebook Vietnamese_SV

Deployment

  • You can download model checkpoints in checkpoints and put it in the folder output
  • Run the following command to test our deployment
streamlit run src/deploy.py --server.address 127.0.0.1 --server.port 8008

Contributions

You can find the contribution of each member in this link Contributions

About

Vietnamese Speaker Verification


Languages

Language:Python 100.0%