Is it possible to add noise or reverb into sample label to increase the recall ?

Question

Is it possible to add noise or reverb into sample label to increase the recall ?

Leeviber opened this issue a year ago · comments

Hi,
The performance of the model are really good when the voice is clean, however if the background is not clean with some noisy or room reverb, the recall rate is really low. is it possible to add some background noise or reverb into keyword audio sample to increase the detect rate under complex scene, Will it affect the recognition success rate of the model? Is such data enhancement done during training?

Chidhambararajan · Answer 1 · Mon Aug 07 2023 17:50:40 GMT+0800 (China Standard Time)

We have added some augmentations during training, but reverb was not included.

During high noise situations, the hotword detector may face issues because it is trained to look for all vocal patterns and match them with the user's provided samples.

One possible solution is to treat the base model as a foundation model and fine-tune it on around 5-10 user-provided samples for a specific word. However, this may cripple the model's ability to identify new words out of the blue.

As suggested, you can consider adding some samples with noise and variations in accents for a word's pronunciation that you want to consider.

After this, you can increase the accuracy threshold.

Chidhambararajan · Answer 2 · Mon Aug 07 2023 17:51:36 GMT+0800 (China Standard Time)

I am currently looking at a clip like architecture to better boost the perfomance of the system

Damian · Answer 3 · Thu Aug 10 2023 03:43:36 GMT+0800 (China Standard Time)

i use a cartoid directional moouse and get good performance in general with a sure sv 1000 ro the legendary 21m 58. These are heavy and will give you strain in 4 hours, but on a stand if you hare doing hands free work.. commanding your computer they are the best.. I have a small room fans running , etc. i use a preamp its about 200$ total and there might be cheaper karaoke mic but look for a dynamic , not condenser mic ( though some mght work ok) but make sure its verry directional in is pattern. even mic arrays on lap tops dont work well for this, a singers mic is the best IMO

Chidhambararajan · Answer 4 · Fri Aug 11 2023 23:43:15 GMT+0800 (China Standard Time)

Like @damian-666 pointed out voice assistants employ directional mics to combat the same problem. The idea is that noise heard by the all mics would be uniform but the volume of voice heard by the mic won't be uniform, they employ some simple math to achieve noise reduction by a great level. This is difficult to do so with a single mic. It would help if you could share a video recording of the issue