santi-pdp / pase

Problem Agnostic Speech Encoder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training PASE architecture for only Speaker ID using Librispeech data

hdubey opened this issue · comments

Hi Mirco, Santi,
Thanks again for this great contributions. I had a look at codes and paper. The architecture is interesting. I want to train this architecture on Librispeech for speaker ID in same say as SincNet is trained. What will be the best way to do it. Assume I have all training and test data prepared as per the protocols of SincNet paper. I want to extract supervised Bottleneck features after it is trained to see how overall FER compares with original SincNet.

Hi @hdubey ,

Do you mean the mutual information training with SincNet (https://arxiv.org/pdf/1812.00271.pdf) or the purely supervised training? I have just uploaded a config file cfg/SincNet_worker.cfg that incorporates the training mechanism of SincNet as MI-only in case you refer to the unsupervised mutual information training. The way to train it would be by specifying the flag --net_cfg in the train.py script to point to the new config file I mention.
If you mean the supervised training part, then have a look at spk_id/nnet.py, where you have to specify the PASE config --fe_cfg ../cfg/PASE.cfg without a pretrained ckpt (nothing in --fe_ckpt) and it will attach the selected classifier --model mlp on top of the front-end. In this case the way to specify the training/validation/test partitions is pretty standard, you handle the --train_guia with filepath pointers, the --test_guia too, and validation will be selected as a randomly sampled subset of --train_guia files (controlled with the ratio parameter --va_split that defaults to 20%).

Hope this helps,
Santi

Hi Santi,
Thanks for suggesting this. I just got the unsupervised MI training started. However, I am more interested in Supervised Speaker ID on Librispeech. When I do python spk_id/nnet.py I get following error "ImportError: No module named 'waveminionet'
".

It is not clear how many arguments are needed to run the supervised one. I want to try RNN classifier after front-end what will be the command in that case. Thanks!

Hi @santi-pdp I fixed the waveminionet issue. However, there seems to be a mandatory data "SPK2iDX", how to generate it for Librispeech? In below command, how can I get the best parameter set for Librispeech for re-producing the supervised PASE speaker ID results that outperformed the SincNets? Thanks!

nnet_copy.py [-h] [--fe_cfg FE_CFG] [--save_path SAVE_PATH]
[--data_root DATA_ROOT] [--batch_size BATCH_SIZE]
[--train_guia TRAIN_GUIA] [--test_guia TEST_GUIA]
[--spk2idx SPK2IDX] [--log_freq LOG_FREQ] [--epoch EPOCH]
[--patience PATIENCE] [--seed SEED] [--no-cuda] [--no-rnn]
[--ft_fe] [--z_bnorm] [--va_split VA_SPLIT] [--lr LR]
[--momentum MOMENTUM] [--max_len MAX_LEN]
[--hidden_size HIDDEN_SIZE] [--emb_dim EMB_DIM]
[--stats STATS] [--opt OPT] [--sched_mode SCHED_MODE]
[--sched_step_size SCHED_STEP_SIZE] [--lrdec LRDEC]
[--test_ckpt TEST_CKPT] [--fe_ckpt FE_CKPT]
[--plateau_mode PLATEAU_MODE] [--model MODEL] [--train]
[--test] [--test_log_file TEST_LOG_FILE] [--inorm_code]
[--uni]

Hello! I have been replicating this experiment recently, but during the process of making the dataset config file, do I know where to obtain these files. (-- train_scp data/LibriSpeed/libri_tr.scp -- test_scp data/LibriSpeed/libri_te.scp\

--Libri_ Dict data/LibriSpeed/Libri_ Dict. npy). I look forward to your reply very much. Thank you.