AlexxIT / StreamAssist

Home Assistant custom component that allows you to turn almost any camera and almost any speaker into a local voice assistant

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add wake word support

edurenye opened this issue · comments

Now that core has added the wake word integration (home-assistant/core#96380) this component could make use of it after running the Voice Activity Detector.

It could be integrated using wyoming protocol, since that core integration also provides a Wyoming implementation that can be used with an openWakeWord container.

+1 on this @AlexxIT any plans to add support for this? Seems like the missing piece

Without having looked at the code, you can probably test it out by changing this line locally to PipelineStage.WAKE_WORD:

start_stage=PipelineStage.STT,

See the updated Assist docs here for wake word detection: https://developers.home-assistant.io/docs/voice/pipelines/#wake-word-detection

You will also have the option of passing in parameters for audio enhancement, so HA can clean up noise and boost the volume if needed.

So I made a fork here, https://github.com/starsoccer/StreamAssist, and tried to get this working but did not have much luck. I will prefix all of the following by saying while I am a developer I dont really know python but anyways theory/details below.

So I made the change mentioned by @balloob to use WAKE_WORD but it seems the parameters for vad.process has changed in homeassistant. Previously the code was just sending a chunk of audio, but now it expects a second parameter, is_speech which seems to be a boolean. I tried setting it to both true/false but neither seems to work and both end up generating the below error which I am a bit lost at

2023-10-19 10:22:53.907 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/config/custom_components/stream_assist/switch.py", line 152, in async_process_audio_stream
async for _ in self.audio_stream(self.close):
File "/config/custom_components/stream_assist/switch.py", line 110, in audio_stream
if not self.vad.process(chunk, False):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/vad.py", line 155, in process
self._timeout_seconds_left -= chunk_seconds
TypeError: unsupported operand type(s) for -=: 'float' and 'bytes'

@synesthesiam As long as your on this thread, are there any plans to support something like this natively in HA? I think a lot of people likely have cameras/microphones and just having a simple UI that lets you make an Assistant Satellite which consists of a input streaam(ideally microphone, or video stream), and then a media player to play voice back to

I guess something like these changes need to be done: home-assistant/core@7856189#diff-4ff817d7964242e3c079f2f2799985713b8a4983de705b6fcf620542fe5897ff

Oh good find, yeah that shows it was previously taking a chunk of aaudio but now it wants a float for the chunk seconds. Honestly I am not sure if this is even the right function to use anymore. I dont really understand the order things should happen in but my thinking is that maybe I should instead be calling process_with_vad and then I can continue to pass in the audio chunk and simply need to figure out how to create a `VoiceActivityDetector

Whats not really clear to me is if I continue to use this process function and instead pass in the chunk time, how its going to actually get the audio chunk as I dont see it being passed in anywhere else. Even in the example test files it seems to just call it without any audio which I dont really get, https://github.com/home-assistant/core/blob/22c21fdc180fec24e3a45e038aba6fb685acd776/tests/components/assist_pipeline/test_vad.py#L33C48-L33C48

@starsoccer The process function is now used when an external VAD has already been used. This was done to avoid running VAD twice for the same audio chunk in a pipeline with wake and STT.

Let me know if you have any more questions, since I wrote the code 😄

@starsoccer The process function is now used when an external VAD has already been used. This was done to avoid running VAD twice for the same audio chunk in a pipeline with wake and STT.

Let me know if you have any more questions, since I wrote the code 😄

Got it, do you have an example of how the audio is passed to the function that I can maybe work from?

Also any plans to build this functionality into HA directly rather then need this custom integration?

Any updates on this so far? It would be REALLY great to have a possibility to get Wall-Panels which already have an internal mic to work as an Assist-Device with Wake-Word 😍

+1 would love to keep working on this but its a bit outside my skill set honestly. Hopefully HA adds/builds this feature in natively letting users specify a input device either video or microphone, and then allowing any speaker to function as an output

Asked about this in the year of the voice chapter 5 live chat but no response (or I missed it). I was thinking this could also be integrated into frigate maybe? Seeing there's vad going on in frigate 0.13

Yeah I made an issue about putting it into frigate but was told this isnt planned, blakeblackshear/frigate#8644

Hopefully someone more technical then me will get this working. Ive asked in the discord and seen someone else ask about it, but so far I dont know anyone who has this working.

Seems this can be a starting point! https://github.com/asmsaifs/StreamAssist

Seems this can be a starting point! https://github.com/asmsaifs/StreamAssist

Cool, I tried using it but getting this vague error,
User input malformed: two or more values in the same group of exclusion 'url' @ data[<url>]

Not sure if I am missing something or not. But any value I seem to put in the url for the stream seems to give this error

In testing in the latest main/master version

@AlexxIT Do you have any info on how we can use the new version and test/debug it? I gave it a try but cant seem to get anything to happen. Ive tried using a RTSP url as well as a camera entity. They both seem to just get stuck in the start phase for wake and then never change.

Reinstall via HACS with manual selecting main version tag

Reinstall via HACS with manual selecting main version tag

I already did that

It works flawlessly! Awesome! My tablets around the house just got superpowers!

Thanks. Unfortunately I don't have time to do a complete test. Also, it all works just horribly in my language.
So hopefully it's working fine for you. Let me know if you have any problems.

https://github.com/AlexxIT/StreamAssist/releases/tag/v2.0.0

Can you add some more troubleshooting info? For instance how to ensure its detecting voice and it actually gets past the wakeword stage/step as that is where mine seems to be stuck

You can enable debug logs

This should get a shout out on year of the voice part 6!

I'm using this on Amazon Fire HD 10+ tablets running IP Webcam for rtsp stream, and Fully Kiosk for the media_player. And it works out of the box! (Mostly using Extended OpenAI Conversation agent, while not local the results are so impressive, having to work with intents seems puny)

Small FR/Q: Should these show up as assist devices?

I have never saw what is assist device.
I have ordered M5Stack Official ATOM Echo Smart Speaker to check how default pipeline works.