Recipe to build monophone, triphone and TDNN chain models

Firstly ensure that the KALDI_ROOT in the path.sh file points to your kaldi installation.

Data Preparation and Dictionary generation

The data preparation that has been done to build any of the models on espnet is enough to get started. All that is needed is the text, utt2spk , spk2utt and wav.scp files. In case other files such as segments , utt2dur etc. exist, please retain them as well.

Have your training, validation and testing data directories renamed to train, dev and test respectively. This is because such a convention has been followed across the run_gmm.sh, run_tdnn.sh and data_prep_NLTM.sh. In case you wish to change the names of these folders, ensure that consistency of such names is maintained across the aforementioned files.

Have the train, dev and test directories within a folder named esp_data. In case you wish to give it a different name, change the espnet_dir in the run_gmm.sh accordingly.

Once you have these folders in order, the data preparation stage can be run by executing the run_gmm.sh file after setting the data_prep variable to 1. Alternatively, execute the command,

bash run_gmm.sh --data_prep 1

After the execution is complete, the data folder would have been created and populated with the data and the corresponding dictionary generated by the unified parser.

LM preparation and Feature extraction

Subsequently, the language model preparation stage can be run by executing the the run_gmm.sh file after setting the prepare_lang variable to 1. Alternatively, execute the command,

bash run_gmm.sh --prepare_lang 1

The MFCC feature extraction stage can be run by executing the the run_gmm.sh file after setting the mfcc variable to 1. Alternatively, execute the command,

bash run_gmm.sh --mfcc 1

Monophone and Triphone model training

Once the data preparation, dictionary generation, LM training and feature extraction are done, the monophone models and the triphone models can be trained. There are upto 3 different triphone model training stages available in the run_gmm.sh file named as tri1, tri2 and tri3 each of which requires the alignments from the previously built model. For example, a tri3 model sources the alignments it needs before training from a tri2 model. A tri1 model on the other hand, gets the alignments it needs from the mono model (monophone model). Given this dependency, ensure that you build the models in the following order: mono, tri1, tri2 and tri3.

Based on the availability of compute resources, set the train_nj and decode_nj variables to appropriate values, when/before executing the run_gmm.sh script. These variables indicate the number of parallel jobs to be run during the training and decode stage respectively. To train these models, set the appropriate variables mono, tri1, tri2 or tri3 to 1 and execute the run_gmm.sh script. Once each of these models are trained, the decode happens on the subset(s) mentioned through the recog_sets variable in the run_gmm.sh script. Also, the WER and SER values would be made available in the verbose.

In case you wish to train all the models at once, execute the following command,

bash run_gmm.sh --mono 1 --tri1 1 --tri2 1 --tri3 1 --train_nj 64 --decode_nj 32

TDNN chain model training

Alignments from the previously built HMM-GMM models are needed while training the TDNN chain models. In the run_tdnn.sh ensure that you pass the absolute path to the data directory to the variable datadir. Also, based on the triphone model that you would like to choose for gathering the alignments, modify the gmm variable in the run_tdnn.sh. Based on the availability of compute resources, set the nj and nj_extractor variables to appropriate values, when/before executing the run_tdnn.sh script. Also, if you wish to run the script multiple times by trying out different alignments for the TDNN training, you may want to set the nnet3_affix andaffix variables to appropriate values to distinguish between the experiments.

Training and decoding with the TDNN models can be done by executing the run_tdnn.sh script.

Speech-Lab-IITM / nltm_kaldi

Recipe to build monophone, triphone and TDNN chain models

Data Preparation and Dictionary generation

LM preparation and Feature extraction

Monophone and Triphone model training

TDNN chain model training

About

Languages