prasanna-ML-expert / wav2letterInference

run FB speech recognition wav2letter project in container using python code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wav2letter Inference API:

wav2letter.ipynb is google colab notebook file.

Run this colab file in google colab, which will fetch all dependencies, compile and run the inference. The output of the inference is served over python web API.

While installing Arrayfire, you need to accept the licence by inputting Y in the output cell.

Upload wav file to wav2letterInference folder and change the name to numbersAudioMale.wav(or change filename/line 16 in convertAudio.py) for Inference


If colab doesnt work, run the container in ubuntu18 machine as below.

Run the container

1)port forward to 8888

sudo docker run -p 8888:8888--rm -itd --ipc=host --name w2l wav2letter/wav2letter:inference-latest

2)Execute the container

sudo docker exec -it w2l bash

wav2letter library/project inside the path /root/wav2letter/

Running inference inside the container using python code,

1)download model from AWS with below command into folder model:

for f in acoustic_model.bin tds_streaming.arch decoder_options.json feature_extractor.bin language_model.bin lexicon.txt tokens.txt ; do wget http://dl.fbaipublicfiles.com/wav2letter/inference/examples/model/${f} ; done

  1. Download this repository, for python code

git clone https://github.com/jkreddy123/wav2letterInference.git

3)Check Inference on shell, run:

/root/wav2letter/build> python ~/wav2letterInference/runmodel.py

NOTE: Change path for binary, model folder and wav file path accordingly in the runmodel file

  1. check Inference results on browser http://0.0.0.0:8888/index, run

python ~/wav2letterInference/convertAudio.py 8888

contact jkreddy@colorssoftware.com

Output looks like below

b'Completed features model file loading elapsed time=2407 microseconds\n'
b'\n'
b'Started acoustic model file loading ... \n'
b'Completed acoustic model file loading elapsed time=4732 milliseconds\n'
b'\n'
b'Started tokens file loading ... \n'
b'Completed tokens file loading elapsed time=1341 microseconds\n'
b'\n'
b'Tokens loaded - 9998 tokens\n'
b'Started decoder options file loading ... \n'
b'Completed decoder options file loading elapsed time=91 microseconds\n'
b'\n'
b'Started create decoder ... \n'
b'Completed create decoder elapsed time=1653 milliseconds\n'
b'\n'
b'Started converting audio input from stdin to text... ... \n'
b'#start (msec), end(msec), transcription\n'
b'0,1000,\n'
b'1000,2000,\n'
b'2000,3000,one \n'
b'3000,4000,two three \n'
b'4000,5000,four \n'
b'5000,6000,five six \n'
b'6000,7000,seven eight \n'
b'7000,8000,nine \n'
b'8000,9000,ten eleven \n'
b'9000,10000,twelve thirty \n'
b'10000,10334,forty fifty \n'
b'Completed converting audio input from stdin to text... elapsed time=2626 milliseconds\n'```

About

run FB speech recognition wav2letter project in container using python code.


Languages

Language:Jupyter Notebook 78.8%Language:Python 21.2%