dscripka / openWakeWord

An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predictions just add up?

clapann opened this issue · comments

Hello! I am new to using openWakeWord and I coded my own script based off of the streaming_server.py snippet provided in the examples folder. I created an API endpoint using flask and following all the rules of openWakeWord but I don't get desired outputs. My model works perfectly on all of the snippets provided in the examples folder so I'm wondering now is this an issue with me or an openWakeWord issue? Also just to state, I am new to Python.

Steps:

  1. I speak saying "Hey, Grape" and I get no predictions >= 0.5
  2. I speak saying "Hey, Grape" and I get no predictions >= 0.5
  3. I speak saying "Hey, Grape" and I get no predictions >= 0.5
  4. Usually on the 4th or 5th time I do get a prediction and its very short as you can see in the code provided below I have logs and after it finally gets detected everything else i say after that will be the same exact prediction and it only logs 1 prediction.
  5. Everything I say after I get a prediction >= 0.5 will be the same exact prediction of the first >= 0.5.

What can i do!

from flask import Flask, request, jsonify
import numpy as np
from openwakeword import Model
import resampy
import wave
import io

app = Flask(__name__)

model_path = "C:/Users/clap/Desktop/wake/hey_grape.onnx"
owwModel = Model(wakeword_models=[model_path], inference_framework='onnx')

@app.route('/detect', methods=['POST'])
def detect_wakeword():
    try:
        audio_data = request.get_data()

        with io.BytesIO(audio_data) as audio_stream:
            with wave.open(audio_stream, 'rb') as wf:
                sample_rate = wf.getframerate()
                chunk_size = 1280
                detected = False

                while True:
                    chunk = wf.readframes(chunk_size)
                    if not chunk:
                        break

                    if len(chunk) % 2 == 1:
                        chunk += b'\x00'

                    data = np.frombuffer(chunk, dtype=np.int16)
                    if sample_rate != 16000:
                        data = resampy.resample(data, sample_rate, 16000)
                    predictions = owwModel.predict(data)
                    for key in predictions:
                        print(predictions[key])
                        if predictions[key] >= 0.5:
                            detected = True
                            break

                    if detected:
                        break

        return jsonify({"detected": detected})
    except Exception as e:
        print(f"Error: {e}")
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5000)

Just to make sure I'm understanding your use-case here:

  1. You have the Flask server above which receives data via POST requests
  2. You have a client somewhere which sends audio to the Flask server (and this audio can be an arbitrary length)
  3. The Flask server processes the entire audio chunk by chunk, and then breaks after it detects an activation

That approach should work overall, but I'm not quite sure what you mean by "Everything I say after I get a prediction >= 0.5 will be the same exact prediction of the first >= 0.5". Are you saying that after one positive activation, the next time the client POSTs audio to the server it immediately returns a positive prediction?

Besides that, I did notice another issue. openWakeWord by default takes as input a chunk of 1280 samples at 16khz. Your code is currently getting a chunk of 1280 at an unknown sample rate and then re-samples it to 16 khz, which will change the total number of samples in the chunk. openWakeWord can handle different sized chunks, but it may lead to somewhat odd performance in some cases, so while troubleshooting I would stay with 1280 samples.

Yes, you're correct about my use case and my issue. When I get an activation I do get the same prediction let's say I get a prediction of 0.00, 0.001, 0.002 and so on until I get 0.56883993 for example on the first positive prediction then the next post request will instantly return the exact same prediction score of 0.56883993 no matter what I say for the next few requests. Also I failed to mention that the first 2-3 requests doesn't recognize the wake word for some odd reason.

Regarding the chunk size issue what chunk size do you recommend I attempt to use? Thank you for your swifty response.

Ah, I think I see the issue.

openWakeWord maintains an internal buffer of audio data as you pass it to the predict function. The models are also trained to activate when the wakeword is at various positions within a ~2 second window, which means that in practice you often get multiple predictions > 0.5 in a row when the audio data contains the wakeword. This normally isn't an issue as in ~0.5 seconds enough new audio will be processed that the model no longer sees the wakeword and predictions go back to the baseline (e.g., ~0).

However, with your implementation you are breaking out of this loop as soon as there is a positive prediction. Then, when you POST another audio file and submit another chunk of frames to openWakeWord, it still has most of the audio from the last activation in the buffer and thus will prediction >0.5 again. This will happen a few more times until the buffer is cleared, just as you observed.

To address this with your code, the simplest approach might be to not immediately break on an activation, and instead let the rest of the wav file process before you return the result to the client.

(As for chunksize, 1280 samples is the recommended default)

Hm, even with that fix in place I do still get issues. I even attempted to use one of the pre-made models but I'm still having the same issue. Also using my model on the web app that's pre-made also works completely fine.

  1. It takes me 4 message receives to be able to get a prediction >= 0.5
  2. When i do finally get a prediction >= 0.5 it's in the very beginning of the prediction which I'm guessing is from the previous prediction which just goes back to the original issue.

Testing:

0.0
0.0
0.0
0.0
0.0
0.0007340908
0.0007340908
0.0007340908
0.0007829368
0.0007829368
0.0007829368
0.00077676773
0.00077676773
0.0008150041
0.0008150041
0.0008150041
0.0007728338
0.0007728338
0.0007728338
0.00077164173
0.00077164173
0.00077164173
0.0007877648
0.0007877648
0.0007619858
0.0007619858
0.0007619858
0.00080129504
0.00080129504
0.00080129504
0.0007739961
0.0007739961
0.0007739961
0.000772655
0.000772655
0.00078463554
0.00078463554
0.00078463554
0.0007878244
0.0007878244
0.0007878244
0.00078877807
0.00078877807
0.00078877807
0.0007748306
0.0007748306
0.00077560544
0.00077560544
0.00077560544
0.00078850985
0.00078850985
0.00078850985
0.0007597506
0.0007597506
0.0007597506
0.00074478984
0.00074478984
0.0007365644
0.0007365644
0.0007365644
0.0007303357
0.0007303357
0.0007303357
0.0007221997
0.0007221997
0.0007221997
0.000739038
0.000739038
0.0007838011
0.0007838011
127.0.0.1 - - [07/Feb/2024 12:40:04] "POST /detect HTTP/1.1" 200 -
0.0007838011
0.0007838011
0.003303349
0.003303349
0.45600992
0.45600992
0.45600992
0.19614744
0.19614744
0.19614744
0.27887672
0.27887672
0.27887672
0.2797197
0.2797197
0.0009841323
0.0009841323
0.0009841323
0.00078991055
0.00078991055
0.00078991055
0.00077563524
0.00077563524
0.00077563524
0.0007599592
0.0007599592
0.00074383616
0.00074383616
0.00074383616
0.00073599815
0.00073599815
0.00073599815
0.0007354319
0.0007354319
0.0007354319
0.0007224977
0.0007224977
0.0007124245
0.0007124245
0.0007124245
0.0007314086
0.0007314086
0.0007314086
0.0007634759
0.0007634759
0.0007634759
0.00078341365
0.00078341365
0.00078341365
0.00078198314
0.00078198314
0.00077587366
0.00077587366
0.00077587366
0.0007610321
0.0007610321
0.0007610321
0.0007484853
0.0007484853
0.0007484853
0.0007415116
0.0007415116
0.0007393956
0.0007393956
0.0007393956
0.00071921945
0.00071921945
0.00071921945
0.00073233247
0.00073233247
0.00073233247
0.0013214648
0.0013214648
0.07529798
0.07529798
0.07529798
0.07529798
127.0.0.1 - - [07/Feb/2024 12:40:15] "POST /detect HTTP/1.1" 200 -
0.0016382039
0.0016382039
0.02003643
0.02003643
0.02003643
0.0020616353
0.0020616353
0.0020616353
0.00091326237
0.00091326237
0.00091326237
0.0008201301
0.0008201301
0.0007724762
0.0007724762
0.0007724762
0.0007663667
0.0007663667
0.0007663667
0.0007453561
0.0007453561
0.0007453561
0.0007367432
0.0007367432
0.0007274747
0.0007274747
0.0007274747
0.0007349551
0.0007349551
0.0007349551
0.0007381141
0.0007381141
0.0007381141
0.00074359775
0.00074359775
0.000754714
0.000754714
0.000754714
0.00074994564
0.00074994564
0.00074994564
0.0007331073
0.0007331073
0.0007331073
0.0007442534
0.0007442534
0.0007442534
0.000765115
0.000765115
0.0007775724
0.0007775724
0.0007775724
0.0007559061
0.0007559061
0.0007559061
0.00075006485
0.00075006485
0.00075006485
0.0007354021
0.0007354021
0.0007338524
0.0007338524
0.0007338524
0.00072959065
0.00072959065
0.00072959065
0.0007646084
0.0007646084
0.0007646084
0.0008356273
127.0.0.1 - - [07/Feb/2024 12:40:22] "POST /detect HTTP/1.1" 200 -
0.0008356273
0.0008356273
0.13757348
0.13757348
0.13757348
0.5279213
0.5279213
0.7301192
0.7301192
0.7301192
0.6127951
0.6127951
0.6127951
0.008642048
0.008642048
0.008642048
0.0008507371
0.0008507371
0.0008009374
0.0008009374
0.0008009374
0.0007816851
0.0007816851
0.0007816851
0.0007581711
0.0007581711
0.0007581711
0.00074633956
0.00074633956
0.0007379949
0.0007379949
0.0007379949
0.0007303655
0.0007303655
0.0007303655
0.0007198453
0.0007198453
0.0007198453
0.0007285476
0.0007285476
0.0007426441
0.0007426441
0.0007426441
0.00076282024
0.00076282024
0.00076282024
0.000767082
0.000767082
0.000767082
0.0007534325
0.0007534325
0.0007568002
0.0007568002
0.0007568002
0.0007542074
0.0007542074
0.0007542074
0.0007469654
0.0007469654
0.0007469654
0.0007395148
0.0007395148
0.00073486567
0.00073486567
0.00073486567
0.0007363558
0.0007363558
0.0007363558
0.00071939826
0.00071939826
127.0.0.1 - - [07/Feb/2024 12:40:29] "POST /detect HTTP/1.1" 200 -

Code:

from flask import Flask, request, jsonify
import numpy as np
from openwakeword import Model
import resampy
import wave
import io

app = Flask(__name__)

# Load the model
model_path = "C:/Users/clap/Desktop/wake/hey_grape.onnx"
owwModel = Model(wakeword_models=[model_path], inference_framework='onnx')

@app.route('/detect', methods=['POST'])
def detect_wakeword():
    try:
        audio_data = request.get_data()

        with io.BytesIO(audio_data) as audio_stream:
            with wave.open(audio_stream, 'rb') as wf:
                sample_rate = wf.getframerate()
                chunk_size = 1280
                detected = False

                while True:
                    chunk = wf.readframes(chunk_size)
                    if not chunk:
                        break

                    if len(chunk) % 2 == 1:
                        chunk += b'\x00'

                    data = np.frombuffer(chunk, dtype=np.int16)

                    if sample_rate != 16000:
                        data = resampy.resample(data, sample_rate, 16000)

                    predictions = owwModel.predict(data)
                    for key in predictions:
                        print(predictions[key])
                        if predictions[key] >= 0.5:
                            detected = True

        return jsonify({"detected": detected})
    except Exception as e:
        print(f"Error: {e}")
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5000)

If using the web app example works correctly, it's likely that there is still an issue with your server implementation. I notice that you are still resampling after picking a fixed chunk size of 1280, when you should set the chunk size after resampling.

Beyond that, though, it's difficult to tell what might be wrong. I would recommend trying to adapt either the web app code or the detect_from_microphone.py script since those seem to work correctly with your model.