End-to-End Speech Recognition using RNN-Transducer

File description

eval.py: rnnt joint model decode
model.py: rnnt model, which contains acoustic / phoneme model
model2012.py: rnnt model refer to Graves2012
seq2seq/*: seq2seq with attention
rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
DataLoader.py: data process
train.py: rnnt training script, can be initialized from CTC and PM model
train_ctc.py: ctc training script
train_att.py: attention training script

Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.
Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013
Train RNNT model:

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

Default only for RNNT

python eval.py <path to best model parameters> --bi

python eval.py <path to best model parameters> --bi --beam <beam size>

Decode	PER
greedy	20.36
beam 100	20.03

Decode	PER
greedy	20.74
beam 40	19.84

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Language:Python 82.5%Language:Shell 17.5%