Here that working fine with ref file but not if a record custom file.

Question

Here that working fine with ref file but not if a record custom file.

warichet opened this issue 2 years ago · comments

Hello,
i working on google collab, so i don't have access to mic.
The work around is to used mp3 or wav file.
To do that i have add this class:

from streams import CustomAudioStream
from pydub import AudioSegment

import numpy as np
import wave

RATE = 16000
index = 0

class SimpleFileStream(CustomAudioStream) :

    def open_stream(self, src, mp3):
        if mp3:
          dst = "Data/sample.wav"
          # convert mp3 to wav              
          sound = AudioSegment.from_mp3(src).set_frame_rate(16000)
          sound.export(dst, format="wav")
          self.wf = wave.open(dst, 'rb')
        else:
          print("Not an mp3")
          self.wf = wave.open(src, 'rb')
          self.wf.rewind()
        print("Get params of wav file " + str(self.wf.getparams()))

    def close_stream(self):
        self.wf.close()

    def get_next_frame(self):
        global index
        print("Index ", index)
        index = index + self.CHUNK
        return np.frombuffer(self.wf.readframes(self.CHUNK),dtype=np.int16)

    """
    Implements stream with sliding window, 
    implemented by inheriting CustomAudioStream
    """
    def __init__(self,sliding_window_secs:float=1/8):
        self.CHUNK = int(sliding_window_secs*RATE)

        CustomAudioStream.__init__(
            self,
            open_stream = self.open_stream,
            close_stream = self.close_stream,
            get_next_frame = self.get_next_frame,
        )

It seems working if i used ref file of github.
But if i record a custom file using audacity it is not detect the wakeword.

If i change the threshold to 0.7 and the activation count to 2 it is work better, but il will increase the chance of getting false positive.

Is it mandatory to have custom ref for each user ?

Best regards
Sebastien

Chidhambararajan · Answer 1 · Sat Feb 26 2022 04:38:08 GMT+0800 (China Standard Time)

It is not required to have custom ref for each user,

while generating the reference file instead of simply increasing the sample count, one could increase the variety of sample audios and then set the threshold to somewhere near 95 or more%

This will reduce the chances of false positives by a great extend

Chidhambararajan · Answer 2 · Sat Apr 16 2022 01:57:06 GMT+0800 (China Standard Time)

Closing issue due to inactivity, feel free to reopen if required