syl22-00 / pocketsphinx.js

Speech recognition in JavaScript and WebAssembly

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Very low accuracy

dan960 opened this issue · comments

I have tried to do recognition from wav file (16kHz mono), but with very low accuracy (~65%) compared to using the pocketsphinx_continuous tool (~95%) with the same models, dictionary and pocketpshinx config options. The whole buffer (int16 vector) is fed to ps_process_raw in chunks (2048), basically mirroring the process of pocketsphinx_continuous tool .
Before passing the file to the recognizer, the wav headers are removed (using decodeAudioData) and then resampled back to 16000(because of the AudioContext automatic resampling to context rate).
The models and dictionary are raw and lazy-loaded.

My theory is that the low accuracy could be cause by the browsers performance. Has anybody ran into something similar?

On a separate note I have also tried the WebAssembly version, because if the problem was in insufficient resources, that would presumably result in increase of accuracy. The compilation runs file but on process it gives runtime error:

Uncaught RuntimeError: integer result unrepresentable
    at _eval_topn (wasm-function[652]:477)
    at _ptm_mgau_codebook_eval (wasm-function[648]:66)
    at _ptm_mgau_frame_eval (wasm-function[646]:125)
    at _acmod_score (wasm-function[371]:234)
    at _phone_loop_search_step (wasm-function[578]:116)
    at _ps_search_forward (wasm-function[611]:109)
    at _ps_process_raw (wasm-function[609]:152)

Browser performance should not affect recognition accuracy, computations are just the same than when compiled natively, but the decoder would certainly run slower in the browser.

If I were you, I would first look at the initialization parameters on both versions (for pocketsphinx.js, they are displayed in the JavaScript console) and make sure they are all the same.

You should also try to find a way to make sure the audio data that are passed to the decoder are the same. I am not sure I understand what you describe about AudioContext resampling your file, but that could be something that affects recognition rate.

same problem...help !!