Stream Assist

Home Assistant custom component that allows you to turn almost any camera and almost any speaker into a local voice assistant.

Component will use:

Stream integration for receiving audio from camera (RTSP/HTTP/RTMP) and automatic transcoding of audio codec into a format suitable for Speech-to-Text (STT)
Voice Activity Detector (VAD) library for auto detect the beginning and end of speech
Assist pipeline integration for run: Speech-to-Text (STT) => Natural Language Processing (NLP) => Text-to-Speech (TTS)
Almost any Media player for play audio respose from Text-to-Speech (TTS)

Assist pipeline can use:

Whisper core Add-on for local STT
Pipper core Add-on for local TTS
Faster Whisper custom integration for local STT
Google Translate core integration for cloud TTS

Important. Component does not support wake word. The recognition process must be started manually or by automation (remote button, motion sensor, etc).

Installation

HACS > Integrations > 3 dots (upper top corner) > Custom repositories > URL: AlexxIT/StreamAssist, Category: Integration > Add > wait > Stream Assist > Install

Or manually copy stream_assist folder from latest release to /config/custom_components folder.

Configuration

Config local Speech-to-Text (STT)

Add local Speech-to-Text Add-on
Settings > Add-ons > Add-on Store > Whisper > Install
Config STT Add-on:
Whisper > Configuration
Add STT Integration:
Settings > Integrations > Whisper > Configure

Config local Text-to-Speech (TTS)

Add local Text-to-Speech Add-on
Settings > Add-ons > Add-on Store > Piper > Install
Config TTS Integration:
Piper > Configuration
Add TTS Integration:
Settings > Integrations > Piper > Configure

Config cloud Text-to-Speech (TTS)

configuration.yaml

tts:
  - platform: google_translate

Config local Voice assistant (NLP)

Config Voice assistant:
Settings > Voice assistants > Home Assistant > Select: STT and TTS

Config Stream Assist

Add Stream Assist Integration
Settings > Integrations > Add Integration > Stream Assist
Config Stream Assist Integration
Settings > Integrations > Stream Assist > Configure

You can select or camera entity_id as audio (MIC) source or stream URL.

You can change Voice activity detector (VAD) settings. It will wait voice of "VAD speech seconds" duration and silence after voice of "VAD silence seconds" duration. Then the text recognition (STT) will start. Maximum voice search duration - "VAD timeout seconds".

You can select Voice Assistant Pipeline for recognition process: STT => NLP => TTS. By default componen will use default pipeline. You can create several Pipelines with different settings. And several Stream Assist components with different settings.

You can select Pipeline end stage when processing will stops:

You can use only MIC => VAD stage to know if there is a voice in the place with the camera. You don't need any pipeline in this case
You can use only MIC => VAD => STT stage and process recognized text inside automation
You can use only MIC => VAD => STT => NLP stage and process recognized intent inside automation
You can use only MIC => VAD => STT => NLP => TTS stage and process response text or audio inside automation
You can use all stages MIC => VAD => STT => NLP => TTS => SND and allow the integration to output audio to the speakers

You can select one or multiple Media players (SND) to output audio response. If your camera support two way audio you can use WebRTC Camera custom integration to add it as Media player.

Using

Component has MIC switch and multiple sensors - VAD, STT, NLP, TTS. There may be fewer sensors, depending on the "Pipeline end stage" setting.

You can create automations to activate the microphone, and to monitor changes in the state of the sensors and their attributes. The sensor attributes contain a lot of useful information about the results of each step of the assistant.

You can also view the pipelines running history in the Home Assistant interface:

Settings > Voice assistants > Pipeline > 3 dots > Debug

Tips

Recommended settings for Whisper:
- Model: small-int8 or medium-int8
- Beam size: 5
You can add remote Whisper/Piper installation from another server:
- First server: Settings > Add-ons > Whisper/Piper > Configuration > Network > Select port
- Second server: Settings > Integrations > Add integration > Wyoming Protocol > Select: first server IP, add-on port
Whisper supports many languages, but Piper much less. You can use Google Translate integration instead of Piper, which support many languages for TTS.
If your environment does not allow you to install add-ons, you can install Faster Whisper custom integration for local STT

marcomow / StreamAssist