tongjinle123/speech-transformer-pytorch_lightning

End2End chinese-english code-switch speech recognition in pytorch


## This is a mixed project borrowing from many awesome projects opened recently.
With pytorch-lightning, experiments can be carried out easily.
And i will try to make evey calculation in a batched and cleaned way.
(such as add bos & eos into batched target and spec augment) Any ideas can be put into the issues,
and welcome for discussion. (This project is still being building and reorganizing)

project features:

    joint attention & ctc beam search decode with rnn lm
    multi dataset
    using pytorch lightning for 16bit training
    Chinese-char level & English-word level tokenizer
    sentence piece tokenizer for english tokenizing
    rnn_lm training
    label smoothing
    customized transformer encoder and decoder see: src/model/modules/transformer_encoder...
    *rezero transformer for some converge problem with half precision and speed consideration

feature:

    log fbank with sub sample
    speed augment
    a spec augment using gpu as a layer in model
    customized feature filtering , see src/loader/utils/build_fbank remove_empty_line_2d

optimizer:

    Ranger

model:

    rezero transformer
    restricted encoder field
    better mask  (may be a little slower than other project but effective)

loss:

    lambda * ce loss + (1-lambda * ctc loss) + code switch loss

requirement:

    see docker/


references:

    https://github.com/ZhengkunTian/OpenTransformer
    https://github.com/espnet/espnet
    https://github.com/jadore801120/attention-is-all-you-need-pytorch
    https://github.com/alphadl/lookahead.pytorch
    https://github.com/LiyuanLucasLiu/RAdam
    https://github.com/vahidk/tfrecord
    https://github.com/kaituoxu/Speech-Transformer
    https://github.com/majumderb/rezero
    https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer



data:
    aishell1 170h
    aishell2 1000h
    magic data 750h
    prime 100h not used
    stcmd 100h not used
    datatang 200h
    datatang 500h
    datatang mix 200h
    librispeech 960h

train step

    english -> eng(sub) + mix + chinese -> chinese + mix -> mix
tongjinle123 / speech-transformer-pytorch_lightning

About

Languages