DDxk / neural_sp

End-to-end ASR implementation with pytorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NeuralSP: Neural network based Speech Processing

How to install

Data preparation

Features

Connectionist Temporal Classification (CTC)

  • beam search
  • Shallow fusion [link]

Attention-based sequence-to-sequence

Encoder

  • CNN encoder
  • (bidirectional/unidirectional) LSTM encoder
  • CNN+(bidirectional/unidirectional) LSTM encoder
  • self-attention (Transformer) encoder [link]
  • Time-Depth Seprarabel (TDS) convolutional encoder [link] (NEW!)

Decoder

  • RNN decoder
    • Beam search
    • Shallow fusion [link]
    • Cold fusion [link]
    • Deep fusion [link]
    • Forward-backward attention decoding [link]
  • Transformer decoder

Attention

  • RNN decoder
    • location [link]
    • additive [link]
    • dot-product
    • Luong's dot/general/concat [link]
    • Multi-headed dor-product [link]
  • Transformer decoder
    • Multi-headed dor-product [link]

Language model (LM)

  • RNNLM (recurrent neural network language model)
  • Gated convolutional LM [link]

Output units

  • phoneme (TIMIT, Switchboard)
  • grapheme
  • wordpiece (BPE, wordpiece)
  • word
  • word-char mix

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

  • Hybrid CTC/attention [link]
  • Hierarchical Attention (e.g., word attention + character CTC) [link]
  • Hierarchical CTC (e.g., word CTC + character CTC) [link]
  • Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
  • Forward-backward attention [link]
  • RNNLM objective [link]

Performance (word error rate)

WSJ

model test_dev93 test_eval92
Char attn N/A N/A
BPE1k attn N/A N/A

CSJ

model eva1l eval2 eval3
Char attn N/A N/A N/A
+ RNNLM N/A N/A N/A
BPE30k attn 8.8 6.3 6.9
+ RNNLM 8.2 6.0 6.7

Switchboard

model SWB CH
Char attn N/A N/A
BPE10k attn N/A N/A
Word10k attn N/A N/A

Librispeech

model dev-clean dev-other test-clean test-other
Char attn N/A N/A N/A N/A
BPE30k attn N/A N/A N/A N/A
Word30k attn N/A N/A N/A N/A

Reference

About

End-to-end ASR implementation with pytorch.


Languages

Language:Python 98.5%Language:Makefile 0.8%Language:Shell 0.7%