crackling sound on voice output using VoiceAssistant

Question

crackling sound on voice output using VoiceAssistant

vaski666 opened this issue 5 months ago · comments

Hi,
I have successfully set up the voice assistant on a CoreS3 using the Yaml below.
https://github.com/m5stack/M5CoreS3-Esphome/blob/main/voice-assistant/m5stack-cores3.yaml
and im very happy to have found this repository 👍
The default settings give a lot of "crackling sounds" on top of the voice when the response is given (response produced by OpenAI Conversation).
Are there any possibiities to adjust the sound output?

I have a ATOM Echo set up as a voice assistant too and the output on this one does not produce these noises.

krangchen commented 4 months ago

+1

海底撩 · Answer 1 · Sat Apr 27 2024 09:18:18 GMT+0800 (China Standard Time)

Hi, i'll check

gunnm80 · Answer 2 · Sun May 19 2024 15:37:56 GMT+0800 (China Standard Time)

Hello
Is there any news here? I have the same problem. Do not understand the voice output at all. Would be great if you could get it to work. With all the (additional) sensors that M5 offers, it would be a great voice base! Probably even the best!

Ludovic BOUÉ · Answer 3 · Wed May 22 2024 12:46:45 GMT+0800 (China Standard Time)

Maybe if I can get the assistant to work I'll be able to check. See #13.

Billy · Answer 4 · Wed May 22 2024 22:09:46 GMT+0800 (China Standard Time)

Hello Is there any news here? I have the same problem. Do not understand the voice output at all. Would be great if you could get it to work. With all the (additional) sensors that M5 offers, it would be a great voice base! Probably even the best!

Would changes are needed to TTS output as a Media Player?

gunnm80 · Answer 5 · Fri May 31 2024 11:29:25 GMT+0800 (China Standard Time)

Hello Is there any news here? I have the same problem. Do not understand the voice output at all. Would be great if you could get it to work. With all the (additional) sensors that M5 offers, it would be a great voice base! Probably even the best!

Would changes are needed to TTS output as a Media Player?

Wanted to try it out. However, the M5 does not appear as a media player for me. What does your configuration look like?

vampler · Answer 6 · Sat Jun 01 2024 02:07:13 GMT+0800 (China Standard Time)

Same problem here

vaski666 · Answer 7 · Wed Jun 05 2024 03:33:23 GMT+0800 (China Standard Time)

any news to this?

gunnm80 · Answer 8 · Sat Jun 15 2024 13:35:34 GMT+0800 (China Standard Time)

Hi, i'll check

Any News? Can I Help?

Billy · Answer 9 · Sun Jun 16 2024 01:52:26 GMT+0800 (China Standard Time)

Hello Is there any news here? I have the same problem. Do not understand the voice output at all. Would be great if you could get it to work. With all the (additional) sensors that M5 offers, it would be a great voice base! Probably even the best!

I chopped down the code dramatically, replaced speaker with media_player, and provided a name so it would appear within Home Assistant. However all attempts to get something to play from the speaker have resulted in silence:

If you would like to try my code you will need to add the following Secrets to ESPHome. Ignore the "HA not found" warning on the display. I wanted to ensure the display and backlight are on while testing the speaker. Lessons learned from issues with the M5StickC+ display backlight and the SPK2.

cores3_address:
cores3_encryption:
cores3_ota:

substitutions:
  name: m5cores3
  friendly_name: M5CoreS3
  loading_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/loading_320_240.png
  idle_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/idle_320_240.png
  listening_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/listening_320_240.png
  thinking_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/thinking_320_240.png
  replying_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/replying_320_240.png
  error_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/error_320_240.png

  loading_illustration_background_color: '000000'
  idle_illustration_background_color: '000000'
  listening_illustration_background_color: 'FFFFFF'
  thinking_illustration_background_color: 'FFFFFF'
  replying_illustration_background_color: 'FFFFFF'
  error_illustration_background_color: '000000'

  voice_assist_idle_phase_id: '1'
  voice_assist_listening_phase_id: '2'
  voice_assist_thinking_phase_id: '3'
  voice_assist_replying_phase_id: '4'
  voice_assist_not_ready_phase_id: '10'
  voice_assist_error_phase_id: '11'  
  voice_assist_muted_phase_id: '12'


esphome:
  name: m5core-s3
  friendly_name: m5core-s3
  project:
    name: m5stack.cores3-voice-assistant
    version: "1.0"
  platformio_options:
    board_build.f_cpu : 240000000L
  libraries:
    - m5stack/M5GFX@^0.1.11
    - m5stack/M5Unified@^0.1.11
  on_boot:
      priority: 600
      then: 
        - script.execute: draw_display
        - delay: 30s
        - if:
            condition:
              lambda: return id(init_in_progress);
            then:
              - lambda: id(init_in_progress) = false;
              - script.execute: draw_display

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  framework:
    type: arduino

psram:
  mode: octal
  speed: 80MHz

external_components:
  - source:
      type: git
      url: https://github.com/m5stack/M5CoreS3-Esphome
    components: [ board_m5cores3, m5cores3_audio, m5cores3_display ]
    refresh: 0s


# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: !secret cores3_encryption
  on_client_connected:
    - script.execute: draw_display
  on_client_disconnected:
    - script.execute: draw_display

ota:
  password: !secret cores3_ota
  
wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  use_address: !secret cores3_address
  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
  on_connect:
    - script.execute: draw_display
    - delay: 5s # Gives time for improv results to be transmitted 
  on_disconnect:
    - script.execute: draw_display

captive_portal:
    

# 
# Globals
# 
globals:
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}
    

# 
# Display
# 
script:
  - id: draw_display
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                  wifi.connected:
                then:
                  - if:
                      condition:
                        api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(m5cores3_lcd).show_page(listening_page);
                                id(m5cores3_lcd).update();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(m5cores3_lcd).show_page(thinking_page);
                                id(m5cores3_lcd).update();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(m5cores3_lcd).show_page(replying_page);
                                id(m5cores3_lcd).update();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(m5cores3_lcd).show_page(error_page);
                                id(m5cores3_lcd).update();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(m5cores3_lcd).show_page(muted_page);
                                id(m5cores3_lcd).update();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(m5cores3_lcd).show_page(no_ha_page);
                                id(m5cores3_lcd).update();
                                break;
                              default:
                                id(m5cores3_lcd).show_page(idle_page);
                                id(m5cores3_lcd).update();
                            }
                      else:
                        - display.page.show: no_ha_page
                        - component.update: m5cores3_lcd
                else:
                  - display.page.show: no_wifi_page
                  - component.update: m5cores3_lcd
          else:
            - display.page.show: initializing_page
            - component.update: m5cores3_lcd

image:
  - file: ${error_illustration_file}
    id: casita_error
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${idle_illustration_file}
    id: casita_idle
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${listening_illustration_file}
    id: casita_listening
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${thinking_illustration_file}
    id: casita_thinking
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${replying_illustration_file}
    id: casita_replying
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${loading_illustration_file}
    id: casita_initializing
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: https://github.com/esphome/firmware/raw/main/voice-assistant/error_box_illustrations/error-no-wifi.png
    id: error_no_wifi
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: https://github.com/esphome/firmware/raw/main/voice-assistant/error_box_illustrations/error-no-ha.png
    id: error_no_ha
    resize: 320x240
    type: RGB24
    use_transparency: true

color:
  - id: idle_color
    hex: ${idle_illustration_background_color}
  - id: listening_color
    hex: ${listening_illustration_background_color}
  - id: thinking_color
    hex: ${thinking_illustration_background_color}
  - id: replying_color
    hex: ${replying_illustration_background_color}
  - id: loading_color
    hex: ${loading_illustration_background_color}
  - id: error_color
    hex: ${error_illustration_background_color}

display:
  - platform: m5cores3_display
    model: ILI9342
    dc_pin: 35
    update_interval: never
    id: m5cores3_lcd
    pages:
      - id: idle_page
        lambda: |-
          it.fill(id(idle_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_idle), ImageAlign::CENTER);
      - id: listening_page
        lambda: |-
          it.fill(id(listening_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_listening), ImageAlign::CENTER);
      - id: thinking_page
        lambda: |-
          it.fill(id(thinking_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_thinking), ImageAlign::CENTER);
      - id: replying_page
        lambda: |-
          it.fill(id(replying_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_replying), ImageAlign::CENTER);
      - id: error_page
        lambda: |-
          it.fill(id(error_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_error), ImageAlign::CENTER);
      - id: no_ha_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_ha), ImageAlign::CENTER);
      - id: no_wifi_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_wifi), ImageAlign::CENTER);
      - id: initializing_page
        lambda: |-
          it.fill(id(loading_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_initializing), ImageAlign::CENTER);
      - id: muted_page
        lambda: |-
          it.fill(Color::BLACK);


# 
# Audio
# 
board_m5cores3:
m5cores3_audio:
  id: m5cores3_audio_1

microphone:
  - platform: m5cores3_audio
    m5cores3_audio_id: m5cores3_audio_1
    id: m5cores3_mic
    adc_type: external
    i2s_din_pin: 14
    pdm: false

media_player:
  - platform: m5cores3_audio
    m5cores3_audio_id: m5cores3_audio_1
    id: media_out
    name: ${friendly_name}
    dac_type: external
    i2s_dout_pin: 13
    mode: mono

ginandbacon · Answer 10 · Wed Jul 03 2024 06:49:54 GMT+0800 (China Standard Time)

Either voice assistants can't be media players or something but jo, it won't show but you can use other speakers, you just have to define them in the yaml. See the link below. I found this out creating announcements as I could never get it to go to my Korvo-1, which has annoying audio issues. In fact I think everything but the S3 boxes does.

Here you go. Was the current YAML ever using Microwakeword instead of Openeakeword? Doesn't appear to as I was looking into this device because my espressif korvo-1 makes a boying popping noises from the 3.5mn output jack. Also has anyone tried the RCA module? I imagine code or YAML would need to be added in order to to get the full potential, same with the Ethernet module too (I'm assuming)

https://www.smarthomejunkie.net/enhancing-voice-assistant-integrate-an-external-speaker-using-esphome/

Imagine using something like this for audio out would require adding it to ESPHome, at least defining pins at a minimum but I could be wrong, I'm not a developer.

https://shop.m5stack.com/products/rca-audio-video-composite-module-13-2

ginandbacon · Answer 11 · Wed Jul 03 2024 07:45:37 GMT+0800 (China Standard Time)

Thie above worked for me. It still played on my Korvo but I simply unplugged the 3.5mm audio output and he mentions this on the video. Not sure if you can just turn the M5 volume all the way down to accomplish the same thing. You have to click on ESPHome, if you click under it you won't have the below option (true for all integrations).

Also you have to go to devices, esphome, then click configure in your voice assistant and check the checkbox to let it make home assistant service calls. It won't work unless you do (it's covered in the video).. You can also now call any ho.e assistant service calls in ESPHome for each device you do this to.

on_tts_end:
  - homeassistant.service:
      service: media_player.play_media
       data:
         entity_id: media_player.vlc_telnet  
         media_content_id: !lambda 'return x;'
         media_content_type: music
         announce: "true"