dscripka / openWakeWord

An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

model.reset() doesn't appear to be working properly

greerviau opened this issue · comments

Im having an issue where when I detect a wake word from a microphone stream, my program ends the steam, executes some other logic, then comes back to listen again. But this time it automatically detects another wake word and repeats even when none was said. This repeats an undefined amount of times, sometimes it doesn't happen but it usually does. Before the stream is created the model.reset() is called, assuming that is supposed to clear any audio data that might be stored, but that doesn't fix the problem. Ive written the following test program to illustrate what Im describing:

import sounddevice as sd
import numpy as np
from openwakeword.model import Model
import time

# Get microphone stream
CHANNELS = 1
RATE = 16000
CHUNK = 1280

owwModel = Model(inference_framework="tflite")

n_models = len(owwModel.models.keys())

# Run capture loop continuosly, checking for wakewords
if __name__ == "__main__":
    # Generate output string header
    print("\n\n")
    print("#"*100)
    print("Listening for wakewords...")
    print("#"*100)
    
    while True:    
        owwModel.reset()
        with sd.InputStream(samplerate=RATE, 
                            channels=CHANNELS, 
                            blocksize=CHUNK,
                            dtype="int16") as stream:
            while True:
                # Get audio
                chunk, _ = stream.read(CHUNK)
                audio = np.frombuffer(chunk, dtype=np.int16)

                # Feed to openWakeWord model
                prediction = owwModel.predict(audio)

                if prediction["alexa"] > 0.8:
                    print("Wake Word!")
                    break
        print("Go somewhere else")
        time.sleep(1)

Running this, I say a wake word once like "Alexa" and I get the following output without repeating it:

####################################################################################################
Listening for wakewords...
####################################################################################################
Wake Word!
Go somewhere else
Wake Word!
Go somewhere else
Wake Word!
Go somewhere else

I thought this might be an issue with sounddevice, but I tried the same thing with the pyaudio implementation described in the detect_from_microphone.py example and had the same issue. What did fix it is reinitializing the owwModel before each stream, but I don't want to have to do that, and I don't think you should have to. Im guessing that theres some data influencing the predictions that isn't being cleared by reset. Any help would be really appreciated.

Thanks for pointing this out, I just merged a related PR and released a new version. model.reset() should function as expected now.

Though in general, I would recommend continuing to process audio after an activation (perhaps in combination with the new debounce_time argument in model.predict to control multiple activations) so that the buffer can naturally clear without having to be manually reset it.