Howl for different languages

Question

Howl for different languages

codeghees opened this issue 4 years ago · comments

I am currently building a pipeline for a research project which requires KWS - I am confused which one would be better off.

In our use-case, we want to identify key-words over streams of audio data and not in wake word setting. Can I use Howl for that purpose?
The model will be served via an API and since it is supervised learning - we want to readily be able to add newer words overtime as well.

Brandon Lee · Answer 1 · Sun Sep 13 2020 00:22:28 GMT+0800 (China Standard Time)

I think howl should be sufficient for that.

honk was mainly aiming for the keywords classification while howl supports keyword spotting over streams of audio with extra inference (filtering) mechanism

codeghees · Answer 2 · Mon Sep 21 2020 17:09:14 GMT+0800 (China Standard Time)

Thank you. What steps would I need to change in case of Urdu keywords (our local language)

Brandon Lee · Answer 3 · Tue Sep 22 2020 06:34:27 GMT+0800 (China Standard Time)

hrm that's an interesting direction.

adding a new dataset to the system can be achieved with a similar change in https://github.com/castorini/howl/pull/31/files

However, I don't think different language is something supported by howl.
The main limitation is coming from missing frame level transcription.

@daemon do you know how one can support other language?

codeghees · Answer 4 · Tue Sep 29 2020 17:34:27 GMT+0800 (China Standard Time)

@Ijj7975 we can generate our own pronunciation dictionary using a method we developed in our lab. Would that help?

Brandon Lee · Answer 5 · Sun Oct 04 2020 00:51:46 GMT+0800 (China Standard Time)

I am not that familiar with how MFA aligner actually works in such cases.
This woould be something that you will need to dig into.

as long as you can generate data of the right format and corresponding frame level. I don't see why not

codeghees · Answer 6 · Tue Nov 03 2020 19:19:31 GMT+0800 (China Standard Time)

Hi, I think I was able to figure out MFA for Urdu. How do I go about supporting it?
@ljj7975
Any help is appreciated.

https://github.com/castorini/howl#preparing-a-dataset

It supports only one word - how do I support multiple?

Brandon Lee · Answer 7 · Sun Jan 10 2021 04:17:10 GMT+0800 (China Standard Time)

As instructed in the read me, you will first need to preprocess your raw datasets using create_raw_dataset.
you should generate one for positive audios and one for negative audios.
Depends on how your raw dataset is structured, you might need to modify some files (just like the change in #31)

Then using mfa with the Urdu dict, you can align the dataset to get the right datasets for howl.

The instruction just show one keyword but it works for many keywords. just specify VOCAB='["fire"]' INFERENCE_SEQUENCE=[0] accordingly