hotword detection for a new language

Question

hotword detection for a new language

nfaraji2002 opened this issue 2 years ago · comments

nfaraji2002 commented 2 years ago

Hi

Is it possible that I train this model for a language with a different alphabet than English, such as Persian?

Thanks,

Aman Rangapur · Answer 1 · Mon Feb 27 2023 04:17:02 GMT+0800 (China Standard Time)

Yes, you can do that. Please go through the training file.

nfaraji2002 · Answer 2 · Mon Feb 27 2023 17:28:03 GMT+0800 (China Standard Time)

Thanks.
I executed training.ipynb, but I faced with an error:

No file or directory found at /content/drive/MyDrive/Siamese/modelCheckpoints_old/model-8-01-0.96.h5

I think I need some pre-trained models, but I could not find it in your github. Is it possible that you upload them in the github space to be accessed by everyone?

My another question is that:
I found that there are lots of English single-word audio files in the directory: "dataset_format_fixed". Do I require a new single-word audio dataset to train for a new language? or Can I use the model trained by your English dataset to customize on my hot words that are with completely different alphabets and letters such as in Arabic:
آ ب ث د ر ز م س ش ح ض
Thanks in advance

Aman Rangapur · Answer 3 · Sat Mar 04 2023 05:07:47 GMT+0800 (China Standard Time)

For your first question: Training again with Arabic words will give a better performance instead of going with the pre-trained model of English since the window frame of audio will be different(guessing this since Arabic words are longer than 1 sec).

Do I require a new single-word audio dataset to train for a new language? Yes if you want to get high accuracy. Our model gives the best accuracy on words that have less than 1.5 sec.

Chidhambararajan · Answer 4 · Sat Mar 04 2023 09:58:18 GMT+0800 (China Standard Time)

Like. @aman-17 pointed out it can be better to train the model from scratch as there is very little to no similarities in the pronunciations between arabian language and english

Secondly a more polished version of the code with pytorch and resnet is currently under the works. Will share the same soon , so stay stuned!

Chidhambararajan · Answer 5 · Fri Apr 14 2023 19:45:31 GMT+0800 (China Standard Time)

The new model is out, can you test it with arabic languages and let us know? The newer model has only been trained for english words , but its perfomance is way better than the old one

Soon we will share the training code of the newer model as well