WeNet
Roadmap | Docs | Papers | Runtime | Pretrained Models | HuggingFace
We share Net together.
Highlights
- Production first and production ready: The core design principle, WeNet provides full stack production solutions for speech recognition.
- Accurate: WeNet achieves SOTA results on a lot of public speech datasets.
- Light weight: WeNet is easy to install, easy to use, well designed, and well documented.
Install
Install python package
pip install git+https://github.com/wenet-e2e/wenet.git
Command-line usage (use -h
for parameters):
wenet --language chinese audio.wav
Python programming usage:
import wenet
model = wenet.load_model('chinese')
result = model.transcribe('audio.wav')
print(result['text'])
Please refer python usage for more command line and python programming usage.
Install for training & deployment
- Clone the repo
git clone https://github.com/wenet-e2e/wenet.git
- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
- Create Conda env:
conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
Build for deployment
Optionally, if you want to use x86 runtime or language model(LM), you have to build the runtime as follows. Otherwise, you can just ignore this step.
# runtime build requires cmake 3.14 or above
cd runtime/libtorch
mkdir build && cd build && cmake -DGRAPH_TOOLS=ON .. && cmake --build .
Please see doc for building runtime on more platforms and OS.
Discussion & Communication
You can directly discuss on Github Issues.
For Chinese users, you can aslo scan the QR code on the left to follow our offical account of WeNet. We created a WeChat group for better discussion and quicker response. Please scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.
Acknowledge
- We borrowed a lot of code from ESPnet for transformer based modeling.
- We borrowed a lot of code from Kaldi for WFST based decoding for LM integration.
- We referred EESEN for building TLG based graph for LM integration.
- We referred to OpenTransformer for python batch inference of e2e models.
Citations
@inproceedings{yao2021wenet,
title={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},
author={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},
booktitle={Proc. Interspeech},
year={2021},
address={Brno, Czech Republic },
organization={IEEE}
}
@article{zhang2022wenet,
title={WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit},
author={Zhang, Binbin and Wu, Di and Peng, Zhendong and Song, Xingchen and Yao, Zhuoyuan and Lv, Hang and Xie, Lei and Yang, Chao and Pan, Fuping and Niu, Jianwei},
journal={arXiv preprint arXiv:2203.15455},
year={2022}
}