k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter

Home Page:https://k2-fsa.github.io/sherpa/onnx/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting end timestamp in result(s)

rohithkodali opened this issue · comments

Is there any possibility to get start and end timestamp both for every token, currently we are getting only start time of every token.

A scenario that we face in general is, we have multiple speech samples where there is a long silence (>0.5 seconds to 3 seconds) as we have only the start time, the particular word is kind of spoken for 3 seconds which is not a proper scenario when we check the duration analysis.

Another question, is it possible to get logits in the result(s) along with timestamps so that we can apply some other algorithms on them?

Please see
#989

Currently, we can only get stop timestamp of each token for CTC models.

Another question, is it possible to get logits in the result(s) along with timestamps so that we can apply some other algorithms on them?

Please see

std::vector<float> ys_probs; //< log-prob scores from ASR model