toverainc / willow-application-server

Willow Application Server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Single ESP box appears to be getting duplicated

nikito opened this issue Β· comments

When flashing the WAS willow build onto the ESP box, I am seeing that it is getting duplicated in WAS:
image

Additionally seems to be issues in the log, and the unit is stuck in a loop. I'm attaching a log export for reference from the ESP box with full debug enabled. πŸ˜ƒ
willow-was-log.txt

Note if this is due to things not yet being ready for WAS please let me know, wasn't sure if I was testing prematurely πŸ˜†

Looking closer at the logs, I'm thinking something goes wrong when it attempts to write the config? The Core Panic seems to happen right after it reads the config from WAS, so seems likely?

Did some debug/testing, I think the issue is that when write_config is called it tries to call deinit_audio, and that in turn is trying to call audio_thread_cleanup(hdl_at). However at this point the audio thread hasn't been initialized yet, so this method will get a null pointer because hdl_at hasn't been initialized yet. As a test I commented out the deinit_audio method in config_write, and everything works fine now (well, relatively speaking, as I think we still want to deinit audio on updates πŸ˜† ). This also got rid of the duplicates appearing, not sure if we still need to look into why that happened in the first place though? Thinking we'd want to add a check here to see if the audio pipeline first has been initialized before attempting to deinit it?

On an unrelated note seems my replies are truncated again, but I assume that is just because issue 159 from the willow repo isn't merged yet πŸ˜„

I was just debugging this issue myself!

Using addr2line (see NOTES.md) you have this exactly right. I've also noticed that if I include a willow.json in the SPIFFS user partition prior to flash this issue is avoided (makes sense).

I've also noticed occasional duplicates. While it figures itself out eventually it's also "not great" and potentially confusing to users.

From what I remember checking if an audio pipeline has been started isn't well supported via ESP-ADF so we have to get creative addressing it via that route.

I've also noted (via all debug logging enabled) that deinit aggressively loops and (if debug logging) spams the console, which is another "not great" thing that would likely be addressed via whatever we come up to address this issue.

Gotcha. I was going to add a simple null checkj on hdl_at and hdl_ap in the deinit method, not sure that would be effective though? (Don't have any knowledge of the ESP-ADF framework yet so not sure if that will cause some sort of other issues πŸ˜† )

I would want to consult with @stintel on how to best approach this - he's currently in really deep on a bunch of other WAS stuff but hopefully we can get to this later today.

We'll be deep in Willow to support dynamic configuration of wake word anyway so this lines up pretty well.

Gotcha, no problem! I have a temp workaround in place for now so I'll continue poking around other areas. I also tested OTA and that worked perfectly so far! πŸ˜ƒ

Speaking of OTA - we've been really impressed with how FAST it is. Whether dynamic config updates or OTA the vast majority of time is spent on the reboot to connect to Wifi - especially if you have KRACK mitigation active on the AP(s).

In any case 10-15 seconds to update the config or firmware on all of your devices is very promising!

Indeed I noticed the same, downtime for an OTA/Config push is very minimal!

Can you test if toverainc/willow@27e6130 fixes the deinit_audio crash?

Confirmed it fixes the boot loop but initial apply looks kind of rough (not sure it matters):

W (17:25:44.706) WILLOW/CONFIG: key audio_response_type not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.753) WILLOW/CONFIG: key speech_rec_mode not found in config, use bogus value to avoid NULL pointer dereference
I (17:25:44.777) WILLOW/HASS: stopping WebSocket client
I (17:25:44.777) WILLOW/WAS: stopping WebSocket client
E (17:25:44.783) WILLOW/HASS: failed to stop WebSocket client: ESP_ERR_INVALID_ARG
W (17:25:44.797) WILLOW/CONFIG: key audio_response_type not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.809) WILLOW/CONFIG: key speech_rec_mode not found in config, use bogus value to avoid NULL pointer dereference

----------------------------- ESP Audio Platform -----------------------------
|                                                                            |
|                       ESP_AUDIO-v1.7.2-20e6bd0-b92a149                     |
|                     Compile date: Nov 30 2022-07:50:12                     |
------------------------------------------------------------------------------
E (8454) ESP_AUDIO_CTRL: Error input parameter. line:1163
I (17:25:44.865) WILLOW/AUDIO: audio player initialized
E (17:25:44.868) I2S: register I2S object to platform failed
W (17:25:44.880) WILLOW/CONFIG: key wake_mode not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.886) WILLOW/CONFIG: key wake_mode not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.897) WILLOW/CONFIG: key wake_mode not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.908) WILLOW/CONFIG: key wake_mode not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.922) WILLOW/CONFIG: key wake_mode not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.930) WILLOW/CONFIG: key wake_mode not found in config, use bogus value to avoid NULL pointer dereference
I (17:25:44.942) WILLOW/AUDIO: Using record buffer '-1'
W (17:25:44.949) WILLOW/CONFIG: key speech_rec_mode not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.960) WILLOW/CONFIG: key audio_codec not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:44.970) WILLOW/CONFIG: key audio_codec not found in config, use bogus value to avoid NULL pointer dereference
E (8579) AFE_SR: vad_mode is error, please modify it!

E (8580) AFE_SR: AFE config error!

E (17:25:44.992) RECORDER_SR: recorder_sr.c:562 (recorder_sr_create): Got NULL Pointer
W (17:25:45.006) WILLOW/CONFIG: key audio_codec not found in config, use bogus value to avoid NULL pointer dereference
W (17:25:45.012) WILLOW/CONFIG: key audio_codec not found in config, use bogus value to avoid NULL pointer dereference
I (17:25:45.044) WILLOW/AUDIO: app_main() - start_rec() finished
E (17:25:45.049) lcd_panel.io.i2c: panel_io_i2c_rx_buffer(128): i2c transaction failed
E (17:25:45.052) TT21100: esp_lcd_touch_tt21100_read_data(173): I2C read error!
E (17:25:45.057) TT21100: esp_lcd_touch_new_i2c_tt21100(103): TT21100 init failed
E (17:25:45.065) TT21100: Error (0xffffffff)! Touch controller TT21100 initialization failed!
E (17:25:45.076) WILLOW/LVGL: failed to initialize touch screen: ESP_FAIL
I (17:25:45.082) WILLOW/NETWORK: MAC address: 7c:df:a1:e8:20:58
I (17:25:45.088) WILLOW/MAIN: Startup complete! Version: 27e6130. Waiting for wake word.
I (17:25:45.124) WILLOW/CONFIG: /spiffs/user/config/willow.json updated, restarting
I (17:25:45.125) WILLOW/SYSTEM: restarting after 6 seconds

Coming back (ESP BOX Lite - so expected touch errors):

I (17:25:58.399) WILLOW/WAS: initializing WebSocket client
I (17:25:58.400) WILLOW/NETWORK: initializing SNTP client
I (12:25:58.403) WILLOW/NETWORK: Using DHCP SNTP server
I (12:25:58.408) WILLOW/HASS: HASS URL: http://hass:8123/api/components
I (12:25:58.432) WILLOW/WAS: WebSocket connected
I (12:25:58.440) WILLOW/HTTP: HTTP status='200' content_length='2279'
I (12:25:58.444) WILLOW/HASS: Home Assistant has Assist Pipeline support
I (12:25:58.445) WILLOW/HASS: HASS URL: ws://hass:8123/api/websocket
I (12:25:58.464) WILLOW/HASS: WebSocket connected
I (12:25:58.497) WILLOW/AUDIO: audio_hal_ctrl_codec: ESP_OK
I (12:25:58.500) WILLOW/AUDIO: audio_element_getinfo(hdl_ae_hs): sample_rate='44100' channels='2' bits='16' bps = '0'

----------------------------- ESP Audio Platform -----------------------------
|                                                                            |
|                       ESP_AUDIO-v1.7.2-20e6bd0-b92a149                     |
|                     Compile date: Nov 30 2022-07:50:12                     |
------------------------------------------------------------------------------
I (12:25:58.543) WILLOW/AUDIO: audio player initialized
E (12:25:58.545) I2S: register I2S object to platform failed
I (12:25:58.553) WILLOW/AUDIO: Using record buffer '6'
MC Quantized wakenet9: wakenet9l_v3h24_alexa_3_0.625_0.645, tigger:v3, mode:3, p:0, (Jun 14 2023 11:15:21)
I (12:25:58.731) WILLOW/AUDIO: app_main() - start_rec() finished
E (12:25:58.734) lcd_panel.io.i2c: panel_io_i2c_rx_buffer(128): i2c transaction failed
E (12:25:58.736) TT21100: esp_lcd_touch_tt21100_read_data(173): I2C read error!
E (12:25:58.744) TT21100: esp_lcd_touch_new_i2c_tt21100(103): TT21100 init failed
E (12:25:58.752) TT21100: Error (0xffffffff)! Touch controller TT21100 initialization failed!
E (12:25:58.762) WILLOW/LVGL: failed to initialize touch screen: ESP_FAIL
I (12:25:58.768) WILLOW/NETWORK: MAC address: 7c:df:a1:e8:20:58
I (12:25:58.775) WILLOW/MAIN: Startup complete! Version: 27e6130. Waiting for wake word.
I (12:26:08.784) WILLOW/TIMER: Wake LCD timeout, turning off LCD
I (12:26:23.326) WILLOW/AUDIO: AUDIO_REC_WAKEUP_START
I (12:26:23.759) WILLOW/AUDIO: AUDIO_REC_VAD_START
I (12:26:23.762) WILLOW/AUDIO: Using WIS URL 'http://wis:20001/api/willow?model=base'
I (12:26:23.764) WILLOW/AUDIO: WIS HTTP client starting stream, waiting for end of speech
I (12:26:25.445) WILLOW/AUDIO: AUDIO_REC_VAD_END
I (12:26:25.446) WILLOW/AUDIO: AUDIO_REC_WAKEUP_END
I (12:26:25.498) WILLOW/AUDIO: WIS HTTP client HTTP_STREAM_POST_REQUEST, write end chunked marker
I (12:26:25.561) WILLOW/AUDIO: WIS HTTP client HTTP_STREAM_FINISH_REQUEST
I (12:26:25.561) WILLOW/AUDIO: WIS HTTP Response = {"language":"en","text":"Turn off dining room."}
I (12:26:25.569) WILLOW/HASS: sending command to Home Assistant via WebSocket: {
	"end_stage":	"intent",
	"id":	1687800385,
	"input":	{
		"text":	"Turn off dining room."
	},
	"start_stage":	"intent",
	"type":	"assist_pipeline/run"
}
I (12:26:25.689) WILLOW/HASS: home assistant response_type: action_done
I (12:26:25.694) WILLOW/HASS: received run-end event on WebSocket: {
	"id":	1687800385,
	"type":	"event",
	"event":	{
		"type":	"run-end",
		"data":	null,
		"timestamp":	"2023-06-26T17:27:08.146712+00:00"
	}
}
I (12:26:25.707) WILLOW/AUDIO: Using WIS TTS URL 'http://wis:20001/api/tts?speaker=CLB&text=Turned off light'
I (12:26:26.826) WILLOW/AUDIO: WIS TTS playback finished

Also confirm the core panic/bootloop is fixed with this commit!

Confirmed it fixes the boot loop but initial apply looks kind of rough

Try toverainc/willow@9eba78e?

NICE - about as clean as it's going to get!

When flashing the WAS willow build onto the ESP box, I am seeing that it is getting duplicated in WAS:

I don't think this is something we can avoid. When Willow crashes, there is no clean close/disconnect of the WebSocket. WAS will notice this after a while, and cleanup the dead client from ConnMgr.

I think is minor enough to be (potentially) addressed post initial stable release.

As one idea, in WAS when OTA update is initiated could we remove the connection (device) from WAS? It won't appear until it comes back.

Ideally when we have a better frontend UI we could also use websockets (or something) so the Clients "page" doesn't need to be manually refreshed - which I think itself would address most of this.

As one idea, in WAS when OTA update is initiated could we remove the connection (device) from WAS? It won't appear until it comes back.

We already do that: https://github.com/toverainc/willow/blob/feature/was/main/ota.c#L176

I noticed as well it will clear up after a bit, and the OTA also will make them disappear if invalid as @stintel said. I agree I don't think this is really a show stopper, but sharing that may be nice to work on later. πŸ™‚

I was suggesting removing them from WAS (in WAS) as soon as the OTA command is sent.

In any case I should probably familiarize myself with the WAS code a bit!

In the wasng branch WAS makes sure the client devices returned via API are unique by MAC.