Does not work "spoken numbers" example

Question

Does not work "spoken numbers" example

aspats opened this issue 9 years ago · comments

I happy to find example like yours with audio classification. But I see that you need to update your code because it has some problems.

For now I am trying to use "training spoken numbers" example and I found doubts/problems:

In file "numbers_solver.prototxt" you are using net: "numbers_net.autoencoder.prototxt". In "numbers_net.autoencoder.prototxt" are defined training and testing lists files ("train_index_256x256.txt", "test_index_256x256.txt"), but those files does not exist. But I fixed in "numbers_solver.prototxt" file net: "numbers_net.prototxt" . After that step I could start to created caffe model.
When I tried to run backend server with "recognition-server.py", I got it:
... net = caffe.Net(model, weights)
Traceback (most recent call last):
File "", line 2, in
Boost.Python.ArgumentError: Python argument types in
Net.init(Net, str, str)
did not match C++ signature:
init(boost::python::api::object, std::string, std::string, int)
init(boost::python::api::object, std::string, int)
And it is not clear in some code you are using original size of images 512x512 and in another code you are reducing size 256x256. Because now I used original images to create model, but in code part "recognition-server.py" and "rocord.py" you are transforming image.
And would like to get original audio files of "spoken numbers" and I want to know how did you made from wav to png?

I will be happy to get answer from you. I really like your audio classification example, just I think you need to update it.

Thanks!

Sebastian Lapuschkin · Answer 1 · Tue Aug 16 2016 20:54:04 GMT+0800 (China Standard Time)

Hi,

I do have the same/similar issue.
Yesterday I

freshly cloned caffe and caffe-speech-recognition from git,
built caffe,
downloaded http://pannous.net/spoken_numbers.tar and extracted into the caffe-speech-recognition root directory
started ./train.sh and stumbled across issue 1) of my previous poster.

After implementing above fixes I now get the Issue from this thread: #1 :

[...]
I0816 14:41:04.538826 3856 layer_factory.hpp:77] Creating layer alpha
I0816 14:41:04.538861 3856 net.cpp:100] Creating Layer alpha
I0816 14:41:04.538871 3856 net.cpp:408] alpha -> data
I0816 14:41:04.538889 3856 net.cpp:408] alpha -> label
I0816 14:41:04.538908 3856 image_data_layer.cpp:38] Opening file train_index.txt
I0816 14:41:04.539526 3856 image_data_layer.cpp:58] A total of 2049 images.
E0816 14:41:04.539546 3856 io.cpp:80] Could not open or find file spoken_numbers/3_Princess_220.wav.png 3
F0816 14:41:04.539655 3856 image_data_layer.cpp:72] Check failed: cv_img.data Could not load spoken_numbers/3_Princess_220.wav.png 3
[...]

Looks to me as if the data/label info line is not split properly.The file is definitely there.
Is this an issue with the version of caffe being too recent / handling the index file differently? If this is the case: Which version of caffe would be known to work with your setup?

Cheers,
Sebastian

pannous · Answer 2 · Tue Aug 16 2016 21:01:48 GMT+0800 (China Standard Time)

Hi, this demo code is two years old, updating the code or data to the current caffe version / requirements shouldn't be too hard though.

Sebastian Lapuschkin · Answer 3 · Wed Aug 17 2016 21:52:26 GMT+0800 (China Standard Time)

Hi pannous,

first let me thank you for your swift reply yesterday.

I went (for now) the lazy way by running caffe-rc2 from https://github.com/BVLC/caffe/archive/rc2.zip
and modifying numbers_solver.prototxt such that numbers_net.prototxt is used (just switch comment/uncomment in lines 2 and 3).
The latter is missing training data and index files.

This seems to work (it is training).

Tome Vang · Answer 4 · Wed Aug 02 2017 23:29:50 GMT+0800 (China Standard Time)

I also found another way around the "3_Princess_220.wav.png file not found" error. I did what Sebastian did and edited numbers_solver.prototxt by uncommenting/commenting lines 2 and 3 so that numbers_net.prototxt is used.

I also edited train_index.txt and test_index.txt and removed all the tabs and replaced them with a whitespace. So the first line of train_index.txt will be "/spoken_numbers/3_Princess_220.wav.png 3" and the line after that will be "/spoken_numbers/6_Allison_60.wav.png 6" etc...

After that everything seems to be working.