Is it possible to modify this to a "continuous" speech recognition?

Question

Is it possible to modify this to a "continuous" speech recognition?

edwardffs opened this issue 6 years ago · comments

I've been trying for a while but I'm not sure as to my solution is good in terms of performance.
BTW, is there any Spanish acoustic model?

Ognjen Todic · Answer 1 · Thu Mar 08 2018 23:49:59 GMT+0800 (China Standard Time)

Hi Eduardo: in your question it's not obvious to me what "this" is and what you mean by "continuous" but let me try to answer anyway.

The PoC ("this") uses our SDK for on-device speech recognition; there are methods in the SDK that allow you to build language model/decoding graph, but their main use is for smaller tasks (small/medium size vocabulary) -- that's what is demonstrated in this PoC. You could change the array with digits to array of any other (meaningful) strings, i.e. words/phrases/sentences.

For larger vocabulary, decoding graph (and the underlying language model) would need to be build ahead of time (on a development machine, server, etc.). This could support language models with tens of thousands of words; depending on the device (CPU, memory) capabilities.

All of this is for continuous speech, i.e. you can just talk and as long as words are in the language model recognition should work.

We don't have Spanish acoustic model yet, but it would be pretty straight forward to add it (we have all the necessary resources).

I will close this issue now; if you have any additional questions feel free to email me at ogi@keenresearch.com. If the questions are specific to this code, then feel free to open other issues.