zsc / End-to-end-ASR-Transformer

An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER

本项目基于transformer 6encopder+6decoder的基本结构构造的端到端的语音识别系统

Model

Instructions

1.数据准备:
- 自行下载数据，遵循文件结构如下：

├── data
│   ├── train
│   ├── dev
│   ├── test

2.数据预处理：
- 运行prepare_data.py对数据进行预处理, 获得整个词表，每个样本音频的mel-scale-spectrogram，文本的token-ids
3.模型训练：
- 运行train_transformer.py --ngpus 8进行transformer网络的训练. 该网络输入mel-scale-spectrogram, 输出token-ids
4.模型推理：
- 运行evlauate.py在dev/test上测试准确率

Acknowledgements

Reference

Ashish Vaswani et al. “Attention Is All You Need” (2017).
Abdel-rahman Mohamed et al. “Transformers with convolutional context for ASR” arXiv: Computation and Language (2019): n. pag.
Albert Zeyer et al. “Improved Training of End-to-end Attention Models for Speech Recognition” Conference of the International Speech Communication Association (2018).

About

An end to end ASR Transformer model training repo

Apache License 2.0

Languages

Language:Python 100.0%