SeanNaren / deepspeech.torch

Speech Recognition using DeepSpeech2 network and the CTC activation function.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to just evaluate a pre-trained network on an audio file?

devinbostIL opened this issue · comments

Hi,

I was able to get my environment setup, and I am wanting to just try evaluating an existing model (such as the LibriSpeech network) to attempt speech-to-text on an audio file. I just want to perform the transcription.
How do I go about this with your library? I am not sure from the documentation what steps are necessary and how much extra development work I will need to do (if any) to perform the transcription task from your library.

Hey my bad! Should update the docs sometime :) To do this use the predict script like below:

th Predict.lua -modelPath /path/to/model.t7 -audioPath /path/to/audio.wav

There are further parameters if you need them, use the -help argument to see them!

Thanks for the information!

I attempted to run the model, and it blew up with this message:

$ th Predict.lua -modelPath libri_deepspeech.t7 -audioPath '/home/devinbost/Downloads/speech_audio_files_sample/nameOfAudioFile.wav'
/home/devinbost/torch/install/bin/luajit: ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 2 module of nn.Sequential:
In 3 module of nn.Sequential:
In 1 module of cudnn.BatchBRNNReLU:
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: input view (5107x1x1x1760) and desired view (5107x-1) do not match
stack traceback:
	[C]: in function 'error'
	/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize'
	/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:79: in function </home/devinbost/torch/install/share/lua/5.1/nn/View.lua:77>
	[C]: in function 'xpcall'
	...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41>
	[C]: in function 'xpcall'
	...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41>
	[C]: in function 'xpcall'
	...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	Predict.lua:42: in main chunk
	[C]: in function 'dofile'
	...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
	[C]: in function 'error'
	...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
	.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	Predict.lua:42: in main chunk
	[C]: in function 'dofile'
	...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

Any ideas?

Is it expecting me to pass it a table or a directory with a collection of audio files?

I tried changing the file and then also the sampling rate, and these were the error messages that I got:

~/src/deepspeech.torch$ th Predict.lua -modelPath libri_deepspeech.t7 -audioPath '/home/devinbost/Downloads/speech_audio_files_sample/4402691.wav'
/home/devinbost/torch/install/bin/luajit: ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
In 3 module of nn.Sequential:
In 1 module of cudnn.BatchBRNNReLU:
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: input view (3951x1x1x1760) and desired view (3951x-1) do not match
stack traceback:
[C]: in function 'error'
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize'
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:79: in function </home/devinbost/torch/install/share/lua/5.1/nn/View.lua:77>
[C]: in function 'xpcall'
...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
Predict.lua:42: in main chunk
[C]: in function 'dofile'
...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
Predict.lua:42: in main chunk
[C]: in function 'dofile'
...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

~/src/deepspeech.torch$ th Predict.lua -modelPath libri_deepspeech.t7 -audioPath '/home/devinbost/Downloads/speech_audio_files_sample/4402691.wav' -sampleRate 13000
/home/devinbost/torch/install/bin/luajit: ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 7 module of nn.Sequential:
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: input view (1x32x26x4864) and desired view (1312x-1) do not match
stack traceback:
[C]: in function 'error'
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize'
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:79: in function </home/devinbost/torch/install/share/lua/5.1/nn/View.lua:77>
[C]: in function 'xpcall'
...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
Predict.lua:42: in main chunk
[C]: in function 'dofile'
...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
Predict.lua:42: in main chunk
[C]: in function 'dofile'
...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

Make sure the file is a 16khz wav file, is this the case?

I've also added documentation here.

I'm having the same problem. I downloaded the LibriSpeech pre trained model, am launching with th Predict.lua -modelPath libri_deepspeech.t7 -audioPath amy.out.wav -dictionaryPath ./dictionary -nGPU 1

I'm trying to run this against a WAV file I downsampled to 16k mono with sox amy.wav amy.out.wav rate 16k channels 1. It is a 16bit file, if that counts for anything.

I'm getting a very similar error when i try to run predict, View.lua:47: input view (241x1x1x1760) and desired view (241x-1) do not match

If I figure out what I'm doing wrong, I'd be happy to contribute some better documentation or strengthen the input file checking in Predict.lua so it throws actionable errors.