greghesp / ha-bambulab

A Home Assistant Integration for Bambu Lab Printers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Home Assistant general connection issue when this integration is installed

jjvelar opened this issue · comments

Describe the bug

I have Home Assistant running on a dedicated NUC with HAOS, and a Bambu Lab A1 with AMS Lite.
After installing Bambu Lab integration, I get Home Assistant connection issues ramdomly after 6 to 24 hours: couldn't access from my local network neither to the HA IP address, nor to .local but Observer showed "healthy" and "connected" status. All green.
When restarted HA in safe mode, no connection issues for a couple of days.
Restarted in normal mode and start desabling integrations, but the connection issue showed up.
Only when it was the turn of Bambu Lab integration and disabled it, got no connection issues while all the other custom integrations were up and running.

To Reproduce

Just install Bambu Lab custom integration, connect your printer, and wait.

Expected Behaviour

No HA connection issues.

What device are you using?

A1

Diagnostic Output

{
  "home_assistant": {
    "installation_type": "Home Assistant OS",
    "version": "2024.2.0",
    "dev": false,
    "hassio": true,
    "virtualenv": false,
    "python_version": "3.12.1",
    "docker": true,
    "arch": "x86_64",
    "timezone": "Europe/Madrid",
    "os_name": "Linux",
    "os_version": "6.1.74-haos",
    "supervisor": "2024.01.1",
    "host_os": "Home Assistant OS 11.5",
    "docker_version": "24.0.7",
    "chassis": "embedded",
    "run_as_root": true
  },
  "custom_components": {
    "edata": {
      "version": "2023.06.3",
      "requirements": [
        "e-data==1.1.5",
        "python-dateutil>=2.8.2"
      ]
    },
    "anniversary": {
      "version": "0.3.0",
      "requirements": []
    },
    "cryptoinfo": {
      "version": "0.1.7",
      "requirements": []
    },
    "alexa_media": {
      "version": "4.9.0",
      "requirements": [
        "alexapy==1.27.10",
        "packaging>=20.3",
        "wrapt>=1.14.0"
      ]
    },
    "teamtracker": {
      "version": "0.1",
      "requirements": [
        "arrow",
        "aiofiles"
      ]
    },
    "personalcapital": {
      "version": "0.1.2",
      "requirements": [
        "personalcapital==1.0.1"
      ]
    },
    "openmediavault": {
      "version": "0.0.0",
      "requirements": []
    },
    "var": {
      "version": "0.15.0",
      "requirements": []
    },
    "kontomierz_sensor": {
      "version": "0.0.1",
      "requirements": []
    },
    "pvpc_hourly_pricing": {
      "version": "1.1.1",
      "requirements": [
        "aiopvpc>=4.2.1"
      ]
    },
    "localtuya": {
      "version": "5.2.1",
      "requirements": []
    },
    "temperature_feels_like": {
      "version": "0.3.8",
      "requirements": [
        "colorlog==6.7.0",
        "ruff==0.1.1"
      ]
    },
    "worlds_air_quality_index": {
      "version": "1.1.0",
      "requirements": []
    },
    "nodered": {
      "version": "3.1.3",
      "requirements": []
    },
    "tplink_deco": {
      "version": "3.6.0",
      "requirements": [
        "pycryptodome>=3.12.0"
      ]
    },
    "huawei_solar": {
      "version": "1.3.1",
      "requirements": [
        "huawei-solar==2.2.9"
      ]
    },
    "smartir": {
      "version": "1.17.9",
      "requirements": [
        "aiofiles>=0.6.0"
      ]
    },
    "scheduler": {
      "version": "v0.0.0",
      "requirements": []
    },
    "tapo_control": {
      "version": "5.4.13",
      "requirements": [
        "pytapo==3.3.18"
      ]
    },
    "yi_hack": {
      "version": "0.3.6",
      "requirements": []
    },
    "awox": {
      "version": "0.1.5",
      "requirements": [
        "pexpect>=4.6.0",
        "pycryptodome>=3.6.6",
        "pygatt[GATTTOOL]>=4.0.5"
      ]
    },
    "bambu_lab": {
      "version": "2.0.15",
      "requirements": []
    },
    "webrtc": {
      "version": "v3.5.1",
      "requirements": []
    },
    "cryptostate": {
      "version": "2.0.0",
      "requirements": []
    },
    "watchman": {
      "version": "0.5.1",
      "requirements": [
        "prettytable==3.0.0"
      ]
    },
    "gas_station_spain": {
      "version": "0.8.0",
      "requirements": []
    },
    "balance_neto": {
      "version": "0.1.0",
      "requirements": []
    },
    "govee": {
      "version": "2023.11.1",
      "requirements": [
        "govee-api-laggat==0.2.2",
        "dacite==1.8.0"
      ]
    },
    "sonoff": {
      "version": "3.5.4",
      "requirements": [
        "pycryptodome>=3.6.6"
      ]
    },
    "hacs": {
      "version": "1.34.0",
      "requirements": [
        "aiogithubapi>=22.10.1"
      ]
    },
    "frigate": {
      "version": "5.0.1",
      "requirements": [
        "pytz==2022.7"
      ]
    },
    "garmin_connect": {
      "version": "0.2.19",
      "requirements": [
        "garminconnect==0.2.12",
        "tzlocal"
      ]
    },
    "spotcast": {
      "version": "v3.6.30",
      "requirements": []
    },
    "delete": {
      "version": "1.8",
      "requirements": []
    },
    "freeds": {
      "version": "0.12.0",
      "requirements": [
        "websockets"
      ]
    },
    "floureon": {
      "version": "1.0.1",
      "requirements": [
        "pythoncrc",
        "broadlink>=0.18.3"
      ]
    }
  },
  "integration_manifest": {
    "domain": "bambu_lab",
    "name": "Bambu Lab",
    "codeowners": [
      "@greghesp",
      "@AdrianGarside"
    ],
    "config_flow": true,
    "dependencies": [
      "device_automation",
      "ffmpeg",
      "mqtt"
    ],
    "documentation": "https://github.com/greghesp/ha-bambulab",
    "iot_class": "local_push",
    "issue_tracker": "https://github.com/greghesp/ha-bambulab/issues",
    "ssdp": [
      {
        "st": "urn:bambulab-com:device:3dprinter:1"
      }
    ],
    "version": "2.0.15",
    "is_built_in": false
  },
  "data": {
    "config_entry": {
      "entry_id": "67ce294e50441fdead4213a030a0046b",
      "version": 2,
      "minor_version": 1,
      "domain": "bambu_lab",
      "title": "**REDACTED**",
      "data": {
        "device_type": "A1",
        "serial": "**REDACTED**"
      },
      "options": {
        "region": "Europe",
        "email": "**REDACTED**",
        "username": "**REDACTED**",
        "name": "Bambu Lab A1 Combo",
        "host": "192.168.1.88",
        "local_mqtt": false,
        "auth_token": "**REDACTED**",
        "access_code": "**REDACTED**",
        "usage_hours": 290.0047222222222
      },
      "pref_disable_new_entities": false,
      "pref_disable_polling": false,
      "source": "user",
      "unique_id": null,
      "disabled_by": null
    },
    "push_all": {
      "ipcam": {
        "ipcam_dev": "1",
        "ipcam_record": "enable",
        "timelapse": "disable",
        "resolution": "1080p",
        "tutk_server": "disable",
        "mode_bits": 3
      },
      "upload": {
        "status": "idle",
        "progress": 0,
        "message": ""
      },
      "nozzle_temper": 23.5625,
      "nozzle_target_temper": 0,
      "bed_temper": 23.40625,
      "bed_target_temper": 0,
      "chamber_temper": 5,
      "mc_print_stage": "1",
      "heatbreak_fan_speed": "0",
      "cooling_fan_speed": "0",
      "big_fan1_speed": "0",
      "big_fan2_speed": "0",
      "mc_percent": 100,
      "mc_remaining_time": 0,
      "ams_status": 0,
      "ams_rfid_status": 0,
      "hw_switch_state": 0,
      "spd_mag": 100,
      "spd_lvl": 2,
      "print_error": 0,
      "lifecycle": "product",
      "wifi_signal": "-33dBm",
      "gcode_state": "FINISH",
      "gcode_file_prepare_percent": "100",
      "queue_number": 0,
      "queue_total": 0,
      "queue_est": 0,
      "queue_sts": 0,
      "project_id": "49298284",
      "profile_id": "48508693",
      "task_id": "96471770",
      "subtask_id": "96471771",
      "subtask_name": "A1 Guia cable Eje X",
      "gcode_file": "",
      "stg": [],
      "stg_cur": 255,
      "print_type": "idle",
      "home_flag": 322454936,
      "mc_print_line_number": "81742",
      "mc_print_sub_stage": 0,
      "sdcard": true,
      "force_upgrade": false,
      "mess_production_state": "active",
      "layer_num": 444,
      "total_layer_num": 444,
      "s_obj": [],
      "filam_bak": [],
      "fan_gear": 0,
      "nozzle_diameter": "0.4",
      "nozzle_type": "stainless_steel",
      "upgrade_state": {
        "sequence_id": 0,
        "progress": "",
        "status": "",
        "consistency_request": false,
        "dis_state": 0,
        "err_code": 0,
        "force_upgrade": false,
        "message": "0%, 0B/s",
        "module": "",
        "new_version_state": 2,
        "cur_state_code": 0,
        "new_ver_list": []
      },
      "hms": [],
      "online": {
        "ahb": false,
        "rfid": false,
        "version": 942176693
      },
      "ams": {
        "ams": [
          {
            "id": "0",
            "humidity": "5",
            "temp": "0.0",
            "tray": [
              {
                "id": "0",
                "remain": 100,
                "k": 0.019999999552965164,
                "n": 1,
                "tag_uid": "0000000000000000",
                "tray_id_name": "",
                "tray_info_idx": "GFL99",
                "tray_type": "PLA",
                "tray_sub_brands": "",
                "tray_color": "010101FF",
                "tray_weight": "0",
                "tray_diameter": "0.00",
                "tray_temp": "0",
                "tray_time": "0",
                "bed_temp_type": "0",
                "bed_temp": "0",
                "nozzle_temp_max": "240",
                "nozzle_temp_min": "190",
                "xcam_info": "000000000000000000000000",
                "tray_uuid": "00000000000000000000000000000000"
              },
              {
                "id": "1",
                "remain": 100,
                "k": 0.019999999552965164,
                "n": 1,
                "tag_uid": "3277907000000100",
                "tray_id_name": "A00-W1",
                "tray_info_idx": "GFA00",
                "tray_type": "PLA",
                "tray_sub_brands": "PLA Basic",
                "tray_color": "FFFFFFFF",
                "tray_weight": "1000",
                "tray_diameter": "1.75",
                "tray_temp": "55",
                "tray_time": "8",
                "bed_temp_type": "1",
                "bed_temp": "35",
                "nozzle_temp_max": "230",
                "nozzle_temp_min": "190",
                "xcam_info": "34218813F401E8030000003F",
                "tray_uuid": "675F3FB8059E4887B3D7EB8ECF854B5D"
              },
              {
                "id": "2",
                "remain": 100,
                "k": 0.019999999552965164,
                "n": 1,
                "tag_uid": "5C5E973900000100",
                "tray_id_name": "A00-B8",
                "tray_info_idx": "GFA00",
                "tray_type": "PLA",
                "tray_sub_brands": "PLA Basic",
                "tray_color": "0086D6FF",
                "tray_weight": "1000",
                "tray_diameter": "1.75",
                "tray_temp": "55",
                "tray_time": "8",
                "bed_temp_type": "1",
                "bed_temp": "35",
                "nozzle_temp_max": "230",
                "nozzle_temp_min": "190",
                "xcam_info": "8813AC0DE803E8039A99193F",
                "tray_uuid": "797878CAF7DB4EA9A1AFB1B83C4E064D"
              },
              {
                "id": "3",
                "remain": 100,
                "k": 0.019999999552965164,
                "n": 1,
                "tag_uid": "0000000000000000",
                "tray_id_name": "",
                "tray_info_idx": "GFL99",
                "tray_type": "PLA",
                "tray_sub_brands": "",
                "tray_color": "C12E1EFF",
                "tray_weight": "0",
                "tray_diameter": "0.00",
                "tray_temp": "0",
                "tray_time": "0",
                "bed_temp_type": "0",
                "bed_temp": "0",
                "nozzle_temp_max": "240",
                "nozzle_temp_min": "190",
                "xcam_info": "000000000000000000000000",
                "tray_uuid": "00000000000000000000000000000000"
              }
            ]
          }
        ],
        "ams_exist_bits": "1",
        "tray_exist_bits": "f",
        "tray_is_bbl_bits": "f",
        "tray_tar": "255",
        "tray_now": "255",
        "tray_pre": "255",
        "tray_read_done_bits": "f",
        "tray_reading_bits": "0",
        "version": 20,
        "insert_flag": true,
        "power_on_flag": true
      },
      "xcam": {
        "buildplate_marker_detector": true
      },
      "vt_tray": {
        "id": "254",
        "tag_uid": "0000000000000000",
        "tray_id_name": "",
        "tray_info_idx": "GFL99",
        "tray_type": "PLA",
        "tray_sub_brands": "",
        "tray_color": "FFF734FF",
        "tray_weight": "0",
        "tray_diameter": "0.00",
        "tray_temp": "0",
        "tray_time": "0",
        "bed_temp_type": "0",
        "bed_temp": "0",
        "nozzle_temp_max": "240",
        "nozzle_temp_min": "190",
        "xcam_info": "000000000000000000000000",
        "tray_uuid": "00000000000000000000000000000000",
        "remain": 0,
        "k": 0.019999999552965164,
        "n": 1
      },
      "lights_report": [
        {
          "node": "chamber_light",
          "mode": "off"
        }
      ],
      "command": "push_status",
      "msg": 0,
      "sequence_id": "22242"
    },
    "get_version": {
      "command": "get_version",
      "sequence_id": "0",
      "module": [
        {
          "name": "ota",
          "project_name": "N2S",
          "sw_ver": "01.02.00.01",
          "hw_ver": "OTA",
          "sn": "**REDACTED**",
          "flag": 3
        },
        {
          "name": "esp32",
          "project_name": "N2S",
          "sw_ver": "01.08.25.64",
          "hw_ver": "AP05",
          "sn": "**REDACTED**",
          "flag": 0
        },
        {
          "name": "mc",
          "project_name": "N2S",
          "sw_ver": "00.00.23.46",
          "loader_ver": "00.00.00.32",
          "hw_ver": "MC02",
          "sn": "**REDACTED**",
          "flag": 0
        },
        {
          "name": "th",
          "project_name": "N2S",
          "sw_ver": "00.00.07.69",
          "loader_ver": "00.00.00.26",
          "hw_ver": "TH01",
          "sn": "**REDACTED**",
          "flag": 0
        },
        {
          "name": "ams_f1/0",
          "project_name": "",
          "sw_ver": "00.00.07.92",
          "loader_ver": "00.00.00.00",
          "ota_ver": "00.00.00.00",
          "hw_ver": "AMS_F102",
          "sn": "**REDACTED**",
          "flag": 0
        }
      ],
      "result": "success",
      "reason": ""
    }
  }
}

Log Extracts

No response

Other Information

No response

Please can you check the logs for anything that would shed light on this. If nothing is present initially please enable debug logging for the integration and see if that shows anything interesting.

Please can you check the logs for anything that would shed light on this. If nothing is present initially please enable debug logging for the integration and see if that shows anything interesting.

Will do that.

I am also experiencing frequent problems with unresponsive Home Assistant front end. I run Hass OS in a VM. I can ping the VM and I can get into the console through the VM's host machine. Today was the first time in a while Home Assistant web UI has frozen up in a while that I had some time to look into it. I looked at my logs and at the end I see this:

2024-02-15 14:50:09.280 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: A Chamber Image thread outer exception occurred:
2024-02-15 14:50:09.281 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: Exception. Type: <class 'RuntimeError'> Args: Event loop is closed
2024-02-15 14:50:09.582 WARNING (Thread-26 (_do_shutdown)) [homeassistant.util.executor] Thread[SyncWorker_27] is still running at shutdown: File "/usr/local/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/wake_on_lan/switch.py", line 153, in update
    status = sp.call(ping_cmd, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
  File "/usr/local/lib/python3.12/subprocess.py", line 391, in call
    return p.wait(timeout=timeout)
  File "/usr/local/lib/python3.12/subprocess.py", line 1264, in wait
    return self._wait(timeout=timeout)
  File "/usr/local/lib/python3.12/subprocess.py", line 2046, in _wait
    (pid, sts) = self._try_wait(0)
  File "/usr/local/lib/python3.12/subprocess.py", line 2004, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
2024-02-15 14:50:09.749 WARNING (MainThread) [homeassistant.util.executor] Thread[SyncWorker_27] is still running at shutdown: File "/usr/local/lib/python3.12/threading.py", line 1030, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker
    work_item.run()
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/wake_on_lan/switch.py", line 153, in update
    status = sp.call(ping_cmd, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
  File "/usr/local/lib/python3.12/subprocess.py", line 391, in call
    return p.wait(timeout=timeout)
  File "/usr/local/lib/python3.12/subprocess.py", line 1264, in wait
    return self._wait(timeout=timeout)
  File "/usr/local/lib/python3.12/subprocess.py", line 2046, in _wait
    (pid, sts) = self._try_wait(0)
  File "/usr/local/lib/python3.12/subprocess.py", line 2004, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
2024-02-15 14:50:10.662 ERROR (Thread-14 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] An exception occurred processing a message:
2024-02-15 14:50:10.663 ERROR (Thread-14 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Exception type: <class 'RuntimeError'>
2024-02-15 14:50:10.663 ERROR (Thread-14 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Exception data: Event loop is closed
2024-02-15 14:50:13.213 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: A Chamber Image thread outer exception occurred:
2024-02-15 14:50:13.213 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: Exception. Type: <class 'RuntimeError'> Args: Event loop is closed
2024-02-15 14:50:16.127 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: A Chamber Image thread outer exception occurred:
2024-02-15 14:50:16.128 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: Exception. Type: <class 'RuntimeError'> Args: Event loop is closed
2024-02-15 14:50:16.763 ERROR (Thread-14 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] An exception occurred processing a message:
2024-02-15 14:50:16.763 ERROR (Thread-14 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Exception type: <class 'RuntimeError'>
2024-02-15 14:50:16.763 ERROR (Thread-14 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Exception data: Event loop is closed
2024-02-15 14:50:19.022 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: A Chamber Image thread outer exception occurred:
2024-02-15 14:50:19.022 ERROR (Thread-17) [custom_components.bambu_lab.pybambu] P1P: Exception. Type: <class 'RuntimeError'> Args: Event loop is closed

I have a full log file, but it is 87 MB (1 MB when 7-zipped). I have a Bambu P1P printer with the P1S conversion kit. I bought my P1P well after launch, so it came with the initially-optional camera pre-installed, for whatever that is worth.

Did you restart HA? Some of those errors I expect to see as I don't correctly handle cleanly shutting down when HA is restarting but it's benign since HA is going away. Multiple errors from the P1P chamber image thread might be expected as I recently allowed it to retry connection up to 10 times with a short sleep between each.

If you can connect to your HA host when HA itself is not being responsive, getting the top CPU using processes from the 'top' command might help shed light on things.

I did not restart Home Assistant. I ran the command to restart the Home Assistant process well after I grabbed this log file. Something about core (edit: ha core start. I'm not at the computer right now.

OK, that probably means those errors are very relevant then. When you get chance could you maybe drop the logs on onedrive or similar and send me a link?

I had two more incidents where my Home Assistant web UI became unavailable. By the time I noticed I think any CPU use spike was over. When I looked in top, there was essentially no CPU use. I run Home Assistant OS in a VM on an x86 host with lots of cores (20?) and lots of RAM (128 GB). I allocate 4 GB RAM and four cores to the Home Assistant OS VM. Here are links to the home-assistant.log files:

https://www.dropbox.com/scl/fi/gucw4x2etxm1ymydpox6w/home-assistant.log.01.7z?rlkey=3dzklj0kd0vx9xziv6n6n9uzc&dl=0
https://www.dropbox.com/scl/fi/zb9u9bxuwxud4amr92wjg/home-assistant.log.02.7z?rlkey=mfplcxhbrygpbvlkhvn4xmt87&dl=0
https://www.dropbox.com/scl/fi/zews4prypmmtym9ks6etc/home-assistant.log.03.7z?rlkey=0gtwj86jrknvbsx6wyixgsr7m&dl=0

All of these logs are from times where Home Assistant's web UI has become inaccessible over the last couple of days. They don't all end exactly the same way, but they do all finish with error messages about Bambu, paho, and mqtt in the last 100 lines. In all cases, the Home Assistant OS was still running, so I could copy out the log file before attempting to restart Home Assistant.

Edit to add: I have a Bambu P1P with P1S conversion kit and multiple Klipper printers, all with at least one camera. I have a Home Assistant Lovelace page set up to be able to monitor all of them. This is just to say that I give the Home Assistant Bambu integration a decent workout.

Second edit: My printer is connected to the integration via LAN mode, if that matters.

Ah, my old friend 'list changed size during iteration'. Rarely happens with this integration. I still have no idea what's causing that since it's happening when HA calls into the integration to retrieve sensor data and I haven't been able to work out which sensor data changing on it as it's reading it. But I had to make a fix for a deadlock it caused.

RuntimeError: list changed size during iteration
During handling of the above exception, another exception occurred:

This is putting HA into a tight loop I think but this doesn't appear to be due to this integration. The other exception it's complaining about is from 'zigpy' which is ZigBee.

Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/clusters/general.py", line 2187, in _handle_cluster_request
await self._handle_query_next_image(
File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/clusters/general.py", line 2251, in _handle_query_next_image
await self.query_next_image_response(
File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/init.py", line 415, in reply
return await self._endpoint.reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/zigpy/endpoint.py", line 278, in reply
return await self.device.reply(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 488, in reply
return await self.request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 322, in request
await send_request()
File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 833, in request
await self.send_packet(
File "/usr/local/lib/python3.12/site-packages/zigpy_znp/zigbee/application.py", line 934, in send_packet
async with self._limit_concurrency():
File "/usr/local/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 761, in _limit_concurrency
async with self._concurrent_requests_semaphore:
File "/usr/local/lib/python3.12/site-packages/zigpy/datastructures.py", line 145, in aenter
await self.acquire()
File "/usr/local/lib/python3.12/site-packages/zigpy/datastructures.py", line 111, in acquire
await fut
asyncio.exceptions.CancelledError: Final process shutdown

I can confirm that once that's happening the main HA thread is in a tight loop which would absolute cause the symptoms you're seeing.

So in your case I think the errors from this integration are just the benign ones that kick in at integration unload because HA has finally tried to shut down due to the zigpy errors.

Hi.
Still having the original issue.
Not sure which kind of logs or additional information I can provide.
Thanks in advance for your help.
Best regards,

José

I need a full debug log capturing the repro.

2024-02-29 17:14:19.472 ERROR (SyncWorker_19) [homeassistant.components.influxdb] Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='192.168.1.144', port=8086): Max retries exceeded with url: /write?db=home_assistant (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff802dfa0f0>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible. Retrying in 60 seconds.

Is InfluxDB the database you use for HA or is that specific to some other integration?

Does the instructions here to configure HA to stop spamming the database with image update events make any difference?
#423 (comment)

Thanks for your reply, @AdrianGarside .
My database is SQLite. The InfluxDB error is weird as a unistalled the InfluxDB addon a while ago so when I checked in the configuration.yaml file I found I forgot to remove the "influxdb:" section. I've commented it and will se if this works.
I have also excluded image, wifi, etc sensors from my "recorder:" section and will see how it goes.
I will keep you posted.

Other than those suspicious db errors which probably aren't related, I didn't see anything standing but you don't have DEBUG logging turned of for this integration so there's virtually no logging try to understand if it's doing something unexpected (specifically getting into a tight loop).

To get full debug logging requires an edit to configuration.yaml:

logger:
default: info
logs:
custom_components.bambu_lab: debug

Note that this will generate a lot more logging output. So if, for e.g., you're running on a raspberry pi off an sd card - it will shorten the life of that card a lot if left running permanently like this but a couple of days would be fine. And proper ssd disks are not affected significantly as they have vastly larger write count capacity. Even endurance rated sd cards still have a tiny write count capacity in comparison.

Ok. Will do it if it hangs again.
thanks once more for your support. Much appreciated, @AdrianGarside

I'll do my best to help you work out how the integration is misbehaving but since I cannot repro it - my only mechanism is going to be debugging logs. So the hope is to find something in those that either immediately tells me where the problem lies or at least gives me an indication of the general area so I can potentially add more logging to narrow it down.

One thing that could be useful is when you start seeing performance problems - if you can connect to the HA host/container and run the 'top' command to see if there's a process consuming 100% cpu. To help determine if this is a cpu tight loop issue or some kind of hang.

Will do that. Thanks again

Hi @AdrianGarside,

I had last night the issue again and, as you anticipated, solving the influxdb error and removing from the recorder wifi, image and other Bambu Lab sensors didn't help.

I have activated the logs as you suggested so will see what we get from there the next time the issue happens.

What I couldn't do was to run the "top" command. I have Home Assistant OS installed so can run the command from the terminal addon, but this option was not available during the issue, when the connection was down.

Then, I tried it attaching a monitor and a keyboard directly to the NUC that is running Home Assistant OS but despite I could run many other commands, "top" was not recognized.

Is there a way to run "top" when connected directly to the NUC?

Thanks.

Hi @AdrianGarside

I got the issue again. Attached the log files with the debug info activated as per your request.

Thanks in advance for your help.

home-assistant.log.zip

I have also been able to run "top" command via a putty connection, so here you've got the screen capture:

image

Do you know how long it was in a bad state before you restarted the home assistant?

The logs don't seem to show anything obvious but the previous instance shutdown at 8.04.16 and the next log starts at 8.56.40. So apparently your HA instance wasn't running for over 50 minutes? When you restarted it just before you got the logs - what did you observe and what actions did you take to get things working again? Do you know why the HA instance shutdown or is that an unexpected shutdown and the reason the HA instance is becoming unresponsive (since it's not running!)?

Another thing to keep an eye on is memory usage - does top show a steady increase in memory usage while HA is running normally?

My current not well supported theory is that something caused you HA instance to restart and somehow the integration caused that shutdown to hang blocking a timely restart. But the errors from the integration at shutdown are somewhat normal - I'm planning to see if I can make it more gracefully handle shutdown without the errors about the mqtt connection event loop being closed but AFAIK those were basically benign. Perhaps they aren't as benign as I thought in all scenarios. And right up until the shutdown starts the integration is still receiving data from your printer.

Actually, there is an uncaught thread exception on _thread_main from the mqtt library once the shutdown starts that isn't something I've seen as part of the normal and benign connection closed errors that I catch during HA shutdown. Given the thread involved it's definitely plausible that error breaks the shutdown and leaves HA in zombie state.

TLDR; I'd like anything more you can tell me about that shutdown to see if I can reproduce the scenario. My shutdowns are only ever from two causes - restart via the HA UI on my production instance (rarely when I update the integration via HACS) or a restart of the docker instance (production and test instances). The latter would never hide any permanent shutdown hangs.

It was probably disconnected for about 40-50 minutes, so it is aligned to what you see in the logs.
The host reboot is done manually by me as it is the only way to connect again with Home Assistant. By "connect again" I mean that when in this situation and I try to open local Home Assistant IP, I get "err_connection" in the browser (or a red banner with a similar message in the Home Assistant Companion app). The only way to being able to access again to Home Assistant is manually reboot the host (I have Home Assistant OS installed in a dedicated NUC)
I do the manual reboot by connecting to the NUC with a keyboard and typing "host reboot".
Regarding memory usage, I have a sensor in Home Assistant that records it. Attached below the graph. I have highlighed when it hanged out and when I manually rebooted.
image

So it looks like there’s a slow memory leak in HA but probably not enough to explain it seemingly trying to restart itself. And when it does try to restart it seems that triggers an exception from the mqtt client we use that seems to hang/block HA.

I’ve been planning to try to more gracefully handle the restart scenario but I’m not convinced I can actually prevent the mqtt client throwing an exception on the main thread. It’s not supposed to be doing anything on that thread…

In case it helps, I have attached the memory usage sensor chart for the last coulpe of days.
I identified when the error connection happened yesterday and the same time today.
As you can see, the slow memory leak is still happening but, as I disabled yesterday your integration, no error connection today,

image

Do you have anything scheduled to happen in HA at 8am?

No big deal: just reboot all cameras for cleansing and switch the heater off.

I'll work on trying to improve the integration handling of HA restart but I'd love to know why HA tried to restart by itself since that's very odd to me.

That's really much appreciated, @AdrianGarside .
But there is something I don't understand: you say "handling of HA restart" but my HA is not restarting. It just refuses the connection lo the local or external (I use Nabu Casa) IP address.
In any case, I am more than happy to beta test.

Per your logs, it was shutting down at 8am when it became unresponsive. The integration appears to have caused it to hang while shutting down due the mqtt client throwing an exception on the main HA thread. That's why I'm asking what might have caused your HA instance to decide to restart. A manual restart triggers a few caught exceptions from the mqtt client in the integration but I've never seen the uncaught one on the main thread that was in your logs.

happy to be special :-)
let me know how can I help: debugging, testing...

Please give this version a try: https://github.com/greghesp/ha-bambulab/releases/tag/v2.0.16-dev1

Done. I have also enabled the full debug for the integration.
Will report how it goes.
Thanks for your support. Very much appreciated.

Hi @AdrianGarside,
I got a connection error today at 2:40pm CET.
Attached the log files:
logs.zip
Let me know if there is anything else I can do to help you to solve this issue.
Thanks!

*** UPDATE ***
I have Uptime Kuma installed so could trace when I lost connection with both Home Assistant locally and through Remote UI provided by NabuCasa and it looks the last active connection happened just before a print was ended.
That print was finalized at 2:16pm CET.
I hope this helps, @AdrianGarside

This is definitely related to the other issue with similar symptoms:

2024-03-11 14:20:29.516 ERROR (MainThread) [homeassistant] Error doing job: an error occurred during closing of asynchronous generator <async_generator object BoundAsyncStream.aiter at 0x7fc529187850>
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/runner.py", line 188, in run
return loop.run_until_complete(setup_and_run_hass(runtime_config))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 672, in run_until_complete
self.run_forever()
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 639, in run_forever
self._run_once()
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1936, in _run_once
handle = heapq.heappop(self._scheduled)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: list changed size during iteration

During handling of the above exception, another exception occurred:

In this case it's less clear what's triggering it but it appers that HA is calling into the integration off the main thread to get data and hitting the list changed size case and that's taking out the main thread.

My only guess right now is that the list in question is the HMS error list. But maybe hitting this as a print was ending will suggest something else to me.

I'm going to add some more logging to see if I can narrow down what data is being requested leading to that list changed size exception. The infrequency of repro is another potential hint it's the HMS error list.

New release with a bit more logging around the HMS error list setting/retrieval to see if that proves it's involved in this:
https://github.com/greghesp/ha-bambulab/releases/tag/v2.0.16-dev2

Also - can you tell me more about these sensors?

2024-03-10 21:20:56.511 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Unknown
2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Info
2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Common
2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Serious
2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Fatal
2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS

New release with a bit more logging around the HMS error list setting/retrieval to see if that proves it's involved in this: https://github.com/greghesp/ha-bambulab/releases/tag/v2.0.16-dev2

Done. Will keep you updated.

Also - can you tell me more about these sensors?

2024-03-10 21:20:56.511 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Unknown 2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Info 2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Common 2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Serious 2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS_Fatal 2024-03-10 21:20:56.512 INFO (MainThread) [homeassistant.components.mqtt.discovery] Found new component: sensor A1_A1 HMS

This is automatically autodiscovered by my MQTT server as is shown as a collection of sensors under main "homeassistant" topic.

image

First time I saw it, I thought about a new feature of Bambu Lab MQTT implementation, but when I had a look into the content, it is just the configuration of the sensor, without real data. Below an example of one of the automatically created sensors for the printer IP address.

{"name":"IP Address","entity_category":"diagnostic","device":{"identifiers":["{\"bambu_lab\":\"A1_A1\",\"type\":\"printer\",\"serial\":\"xxx\",\"model\":\"A1\"}","A1_A1","xxx"],"manufacturer":"Bambu Labs","model":"A1","name":"A1_A1"},"unique_id":"A1_A1_IP_Address","object_id":"A1_A1_IP_Address","has_entity_name":false,"state_topic":"homeassistant/sensor/A1_A1/IP_Address/state","json_attributes_topic":"homeassistant/sensor/A1_A1/IP_Address/attr","availability_topic":"homeassistant/sensor/A1_A1/status/state"}

I have no idea how that's getting autodiscovered - this integration doesn't push the payload I see in your screenshot. Do you also have the node red bambu integration enabled? The HMS severity levels are something he added to both integrations.

I have no idea how that's getting autodiscovered - this integration doesn't push the payload I see in your screenshot. Do you also have the node red bambu integration enabled? The HMS severity levels are something he added to both integrations.

I tried WolfwithSword's Node-RED integration but it was too complex.

Then I tried your integration and as it was hanging my Home Assistant, I created in Node-RED a MQTT server with my printer IP to get some "easy" topics from there: print progress, nozzle and bed temperatures, etc.

No idea where this "A1_A1" is coming from as I didn't created anything called "A1_A1". As you can see below, there are two devices, one for the printer and another one for the AMS, but all sensors and not available or are "unknown".

image

image

image

In any case, I had last night the same issue again: Home Assistant got disconnected at around 2am.
Logs are attached below:

logs_2.zip

I have no idea how that's getting autodiscovered - this integration doesn't push the payload I see in your screenshot. Do you also have the node red bambu integration enabled? The HMS severity levels are something he added to both integrations.

I tried WolfwithSword's Node-RED integration but it was too complex.

Then I tried your integration and as it was hanging my Home Assistant, I created in Node-RED a MQTT server with my printer IP to get some "easy" topics from there: print progress, nozzle and bed temperatures, etc.

No idea where this "A1_A1" is coming from as I didn't created anything called "A1_A1". As you can see below, there are two devices, one for the printer and another one for the AMS, but all sensors and not available or are "unknown".

image

image

image

in case it helps, attached the diagnostic downloads of these two "ghost" A1_A1 and AMS_0_A1 devices:

mqtt-58c861beb264424cec95a0ddd430941a-A1_A1-268220bf89501616db9d13a5c42d8c47.json
mqtt-58c861beb264424cec95a0ddd430941a-AMS_0_A1-fb1825581399299642c7c66d49bbdfe7.json

In any case, I had last night the same issue again: Home Assistant got disconnected at around 2am. Logs are attached below:

logs_2.zip

This latest repro looks different. The last uncaught thread exception isn't the MainThread as in your prior repros / the other issue and isn't due to the list changed size exception.

This is the thread that died:
2024-03-13 00:28:21.246 INFO (Thread-4 (_thread_main)) [homeassistant.components.mqtt.client] Connected to MQTT server core-mosquitto:1883 (0)
...
2024-03-13 01:49:21.654 ERROR (Thread-4 (_thread_main)) [paho.mqtt.client] Caught exception in on_message: Event loop is closed
2024-03-13 01:49:21.658 ERROR (Thread-4 (_thread_main)) [root] Uncaught thread exception

I do then see this integration fall over but it's still successfully receiving data from the printer fine - it's just when calling into HA to tell it there's updated sensor data that it then receives the repeated exception that hit the thread above by the looks of it.

2024-03-13 01:49:21.852 DEBUG (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Received data from: A1
2024-03-13 01:49:21.854 DEBUG (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] Manually updated bambu_lab data
2024-03-13 01:49:21.854 ERROR (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] An exception occurred calling async_set_updated_data():
2024-03-13 01:49:21.854 ERROR (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] Exception type: <class 'RuntimeError'>
2024-03-13 01:49:21.855 ERROR (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] Exception data: Event loop is closed
2024-03-13 01:49:22.998 DEBUG (Thread-7) [custom_components.bambu_lab] Manually updated bambu_lab data
2024-03-13 01:49:22.998 ERROR (Thread-7) [custom_components.bambu_lab] An exception occurred calling async_set_updated_data():
2024-03-13 01:49:22.999 ERROR (Thread-7) [custom_components.bambu_lab] Exception type: <class 'RuntimeError'>
2024-03-13 01:49:22.999 ERROR (Thread-7) [custom_components.bambu_lab] Exception data: Event loop is closed
2024-03-13 01:49:23.890 DEBUG (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Received data from: A1
2024-03-13 01:49:25.820 DEBUG (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Received data from: A1
2024-03-13 01:49:27.701 DEBUG (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Received data from: A1
2024-03-13 01:49:29.339 DEBUG (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Received data from: A1
2024-03-13 01:49:29.341 DEBUG (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] Manually updated bambu_lab data
2024-03-13 01:49:29.342 ERROR (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] An exception occurred calling async_set_updated_data():
2024-03-13 01:49:29.342 ERROR (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] Exception type: <class 'RuntimeError'>
2024-03-13 01:49:29.343 ERROR (Thread-5 (mqtt_listen_thread)) [custom_components.bambu_lab] Exception data: Event loop is closed

Looking back at your previous logs with this fresh info I see the similar pattern in the prior repro - it wasn't the HA MainThread that died but the same Thread-4 (_thread_main) and then from that point onwards this integration was still receiving printer data but hitting an error every time it tried to tell HA there was new data.

And the repro before that was the same.

So now I feel like I understand what's going on here even less. :(. Trying to eliminate the remnants of the node red might be worth a try if it's really the connection to your mosquitto server that's dying and triggering the complete failure.

I have disabled my Node-RED MQTT broker node and enabled again your integration.
Let's see what happens. I will keep you updated.

Hi,
New "connection refused" error.
This time, only your integration was enabled as I have disabled my Node-RED MQTT broker node.
Attached the log files. I have downloaded them via putty before rebooting the host.

home-assistant.log.zip

Wow, tons of errors in this log. I do see that home assistant, having hit some errors, starts to shut down and the new code to handle to report this and gracefully handle it runs and closes connection to the printer cleanly without any errors:

2024-03-16 00:14:01.927 INFO (MainThread) [homeassistant.components.automation.automation_123] Notificación - Home Assistant Shutdown: Running automation actions
2024-03-16 00:14:01.927 INFO (MainThread) [homeassistant.components.automation.automation_123] Notificación - Home Assistant Shutdown: Executing step call service
2024-03-16 00:14:01.933 DEBUG (MainThread) [custom_components.bambu_lab] HOME ASSISTANT IS SHUTTING DOWN
2024-03-16 00:14:01.933 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Disconnect: Client Disconnecting
2024-03-16 00:14:06.726 DEBUG (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Ended listen loop.
2024-03-16 00:14:06.727 INFO (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] MQTT listener thread exited.

After that point there are then tons of errors hit the home assistant threads but eventually it shuts down. Prior to the shutdown I see widespread connectivity problems seemingly hitting HA & all your integrations.

Prior to that shutdown I see the last log from the integration was some 7 minutes earlier:

2024-03-16 00:05:04.477 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Choose at step 3: choice 3: Running automation actions

And before that it seems like some automation kicked in right as the integration lost connection to the printer and a force refresh executed (this isn't something the integration ever does by itself):

2024-03-16 00:04:59.462 WARNING (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] On Disconnect: Disconnected from Broker: 16
2024-03-16 00:04:59.462 DEBUG (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab] Manually updated bambu_lab data
2024-03-16 00:04:59.463 DEBUG (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] PROPERTYCALL: get_hms_errors
2024-03-16 00:04:59.472 INFO (Thread-7) [custom_components.bambu_lab.pybambu] Watchdog thread exited.
2024-03-16 00:04:59.473 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Running automation actions
2024-03-16 00:04:59.474 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Executing step call service
2024-03-16 00:04:59.475 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Force Refresh: Getting Version Info
2024-03-16 00:04:59.475 ERROR (MainThread) [custom_components.bambu_lab.pybambu] Failed to send message to topic device/03919A3A2200275/request
2024-03-16 00:04:59.475 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Force Refresh: Request Push All
2024-03-16 00:04:59.475 ERROR (MainThread) [custom_components.bambu_lab.pybambu] Failed to send message to topic device/03919A3A2200275/request
2024-03-16 00:04:59.476 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Executing step delay 0:00:05
2024-03-16 00:05:00.084 INFO (MainThread) [homeassistant.components.automation.inmp441_matrix_on_off] WLED INMP441 Matrix - ON at evening and OFF at night: Running automation actions
2024-03-16 00:05:00.135 INFO (Thread-8) [custom_components.bambu_lab.pybambu] A1: Chamber image thread exited.
2024-03-16 00:05:00.135 WARNING (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] On Disconnect: Disconnected from Broker: 16

2024-03-16 00:05:00.387 INFO (MainThread) [homeassistant.components.automation.automation_120] OctoPrint - Nozzle & Bed Temperature Target Alignment: Running automation actions

2024-03-16 00:05:04.477 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Choose at step 3: choice 3: Running automation actions

And it's from that point onwards that it looks all of your integrations start to have connectivity issues and there's no more bambu lab integration logging until the HA shutdown starts.

Some questions:

  • What does your step delay automation do that seems to fire right as things go south.
  • And what might have triggered the forced refresh?
  • And do you know why HA was shutting down in this repro - my bet is still on some kind of attempt to recover from the broken state it's getting into.

TLDR; In this log I see widespread connectivity issues for all your integrations starting around 12.05am. Resulting in tons of errors and then HA attempting to shutdown around 12.14am and seemingly succeeding.

I have disabled my Node-RED MQTT broker node and enabled again your integration. Let's see what happens. I will keep you updated.

We're still seeing the bambu lab entities being mqtt auto discovered - what's causing that? I'd expect that to add quite a bit of extra (and probably pointless) load to your system.

Hi, thanks a lot for taking the time to go to the bottom of this. Much appreciated.
Answers to your comments/questions below.

Wow, tons of errors in this log. I do see that home assistant, having hit some errors, starts to shut down and the new code to handle to report this and gracefully handle it runs and closes connection to the printer cleanly without any errors:

2024-03-16 00:05:04.477 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Choose at step 3: choice 3: Running automation actions

-- That's an automation that is triggered by print status and print_progress sensors status change and send me a Telegram notification.

And before that it seems like some automation kicked in right as the integration lost connection to the printer and a force refresh executed (this isn't something the integration ever does by itself):

-- That same automation presses the force_refresh_data button to make sure I get updated information right before sending the Telegram notification. I have now disabled it.

2024-03-16 00:04:59.462 WARNING (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] On Disconnect: Disconnected from Broker: 16 2024-03-16 00:04:59.462 DEBUG (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab] Manually updated bambu_lab data 2024-03-16 00:04:59.463 DEBUG (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] PROPERTYCALL: get_hms_errors 2024-03-16 00:04:59.472 INFO (Thread-7) [custom_components.bambu_lab.pybambu] Watchdog thread exited. 2024-03-16 00:04:59.473 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Running automation actions 2024-03-16 00:04:59.474 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Executing step call service 2024-03-16 00:04:59.475 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Force Refresh: Getting Version Info 2024-03-16 00:04:59.475 ERROR (MainThread) [custom_components.bambu_lab.pybambu] Failed to send message to topic device/03919A3A2200275/request 2024-03-16 00:04:59.475 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Force Refresh: Request Push All 2024-03-16 00:04:59.475 ERROR (MainThread) [custom_components.bambu_lab.pybambu] Failed to send message to topic device/03919A3A2200275/request 2024-03-16 00:04:59.476 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Executing step delay 0:00:05 2024-03-16 00:05:00.084 INFO (MainThread) [homeassistant.components.automation.inmp441_matrix_on_off] WLED INMP441 Matrix - ON at evening and OFF at night: Running automation actions 2024-03-16 00:05:00.135 INFO (Thread-8) [custom_components.bambu_lab.pybambu] A1: Chamber image thread exited. 2024-03-16 00:05:00.135 WARNING (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] On Disconnect: Disconnected from Broker: 16

2024-03-16 00:05:00.387 INFO (MainThread) [homeassistant.components.automation.automation_120] OctoPrint - Nozzle & Bed Temperature Target Alignment: Running automation actions

2024-03-16 00:05:04.477 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Choose at step 3: choice 3: Running automation actions

And it's from that point onwards that it looks all of your integrations start to have connectivity issues and there's no more bambu lab integration logging until the HA shutdown starts.

Some questions:

  • What does your step delay automation do that seems to fire right as things go south.

-- I believe you are referring to the automation that notifies me via Telegram.

  • And what might have triggered the forced refresh?

-- That same automation, but I have disabled it now.

  • And do you know why HA was shutting down in this repro - my bet is still on some kind of attempt to recover from the broken state it's getting into.

-- I am afraid I don't know.

TLDR; In this log I see widespread connectivity issues for all your integrations starting around 12.05am. Resulting in tons of errors and then HA attempting to shutdown around 12.14am and seemingly succeeding.

-- I had yesterday some connectivity issues and had to reboot the router twice, but the previous days I had the same "connection refused" error, local network and router were OK.

I have disabled my Node-RED MQTT broker node and enabled again your integration. Let's see what happens. I will keep you updated.

We're still seeing the bambu lab entities being mqtt auto discovered - what's causing that? I'd expect that to add quite a bit of extra (and probably pointless) load to your system.

I really don't understand where they are coming from, and unsure why they keep showing up as I disabled both, the one for the A1 and the one for the AMS. Anyway, I have now deleted them.

I will re-enable your integration after the changes I have done (deleting MQTT auto-discovered A1 and AMS devices, and removing the force_refresh_data button press) and let's see how it goes...

In the latest logs it appears that the shutdown completed but the preceding errors could mean it only mostly succeeded:

File "/usr/local/lib/python3.12/asyncio/base_events.py", line 544, in _check_default_executor
raise RuntimeError('Executor shutdown has been called')
RuntimeError: Executor shutdown has been called
2024-03-16 00:17:14.548 ERROR (MainThread) [custom_components.tapo_control] Executor shutdown has been called

The 'mostly' part might be that it didn't restart automatically. Or maybe that would always be the case for this abnormal shutdown.

The force refresh button press isn't needed when you have the configuration running normally (haven't opted into the manual refresh mode). Removing it won't degrade your automation data quality.

In the latest logs it appears that the shutdown completed but the preceding errors could mean it only mostly succeeded:

File "/usr/local/lib/python3.12/asyncio/base_events.py", line 544, in _check_default_executor raise RuntimeError('Executor shutdown has been called') RuntimeError: Executor shutdown has been called 2024-03-16 00:17:14.548 ERROR (MainThread) [custom_components.tapo_control] Executor shutdown has been called

The 'mostly' part might be that it didn't restart automatically. Or maybe that would always be the case for this abnormal shutdown.

The force refresh button press isn't needed when you have the configuration running normally (haven't opted into the manual refresh mode). Removing it won't degrade your automation data quality.

all this "default_executor" sounds like Klingon to me :-)

Re-reading that part of the log I think my initial take was wrong - shutdown has started but I don't think it ever completes.

From the logs I can't see any evidence for the bambu integration causing either the shutdown to start or potentially causing it to fail to complete. But if it is indeed the case that this problem doesn't materialize without the integration installed (vs perhaps just hitting it sooner), then I'd love to know if the shutdown is happening regardless but it successfully auto-restarts and you'd never noticed.

I have disabled my Node-RED MQTT broker node and enabled again your integration. Let's see what happens. I will keep you updated.

We're still seeing the bambu lab entities being mqtt auto discovered - what's causing that? I'd expect that to add quite a bit of extra (and probably pointless) load to your system.

I really don't understand where they are coming from, and unsure why they keep showing up as I disabled both, the one for the A1 and the one for the AMS. Anyway, I have now deleted them.

I will re-enable your integration after the changes I have done (deleting MQTT auto-discovered A1 and AMS devices, and removing the force_refresh_data button press) and let's see how it goes...

They are from the node red integration. From the developer they will persist until deleted manually:
"they can just delete in mqtt explorer everything under A1_A1 and AMS_0_A1"

At this point I don't think that's involved in your issues but it would be useful to do the cleanup to eliminate it as a possibility.

Comparing your shutdown to a clean shutdown locally I do see that in your case the watchdog and chamber image threads never exit. Those are triggered by the mqtt client reacting to MQTT disconnect via callback that would be into the main thread:

Yours:
2024-03-16 00:14:01.933 DEBUG (MainThread) [custom_components.bambu_lab] HOME ASSISTANT IS SHUTTING DOWN
2024-03-16 00:14:01.933 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Disconnect: Client Disconnecting
2024-03-16 00:14:06.726 DEBUG (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] Ended listen loop.
2024-03-16 00:14:06.727 INFO (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] MQTT listener thread exited.

Mine (extra printer type logging added since I have 3):

2024-03-16 15:06:01.613 DEBUG (MainThread) [custom_components.bambu_lab] HOME ASSISTANT IS SHUTTING DOWN
2024-03-16 15:06:01.613 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] P1S Disconnect: Client Disconnecting
2024-03-16 15:06:01.614 WARNING (MainThread) [custom_components.bambu_lab.pybambu] P1S On Disconnect: Disconnected from Broker: 0
2024-03-16 15:06:01.615 INFO (Thread-8) [custom_components.bambu_lab.pybambu] P1S Watchdog thread exited.
2024-03-16 15:06:01.630 INFO (Thread-9) [custom_components.bambu_lab.pybambu] P1S: Chamber image thread exited.
2024-03-16 15:06:01.632 WARNING (Thread-4 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] P1S On Disconnect: Disconnected from Broker: 0
2024-03-16 15:06:01.632 DEBUG (Thread-4 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] P1S Ended listen loop.
2024-03-16 15:06:01.632 INFO (Thread-4 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] P1S MQTT listener thread exited.

So that suggests the callback into the main thread couldn't complete. The Main Thread isn't completely stuck as there's plenty of activity - but almost entirely errors.

Ah no, that extra activity already happened in your case before things went completely south so it couldn't repeat - there was no active connection to the printer at the point the shutdown occurred.

2024-03-16 00:04:59.462 WARNING (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] On Disconnect: Disconnected from Broker: 16
2024-03-16 00:04:59.472 INFO (Thread-7) [custom_components.bambu_lab.pybambu] Watchdog thread exited.
2024-03-16 00:04:59.473 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Running automation actions
2024-03-16 00:04:59.474 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Executing step call service
2024-03-16 00:04:59.475 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Force Refresh: Getting Version Info
2024-03-16 00:04:59.475 ERROR (MainThread) [custom_components.bambu_lab.pybambu] Failed to send message to topic device/03919A3A2200275/request
2024-03-16 00:04:59.475 DEBUG (MainThread) [custom_components.bambu_lab.pybambu] Force Refresh: Request Push All
2024-03-16 00:04:59.475 ERROR (MainThread) [custom_components.bambu_lab.pybambu] Failed to send message to topic device/03919A3A2200275/request

2024-03-16 00:04:59.476 INFO (MainThread) [homeassistant.components.automation.notificacion_bambu_lab_a1_print_progress] Notificación - Bambu Lab A1 Print Progress: Executing step delay 0:00:05
2024-03-16 00:05:00.084 INFO (MainThread) [homeassistant.components.automation.inmp441_matrix_on_off] WLED INMP441 Matrix - ON at evening and OFF at night: Running automation actions
2024-03-16 00:05:00.135 INFO (Thread-8) [custom_components.bambu_lab.pybambu] A1: Chamber image thread exited.
2024-03-16 00:05:00.135 WARNING (Thread-6 (mqtt_listen_thread)) [custom_components.bambu_lab.pybambu] On Disconnect: Disconnected from Broker: 16

If your automation just does a force refresh button press, that wouldn't explain the initial disconnect. The force refresh attempt failed in this case because the connection to the printer had just dropped right before that automation fired.

An observation, no idea if related in any way, but all of this started just after a database backup completed:

2024-03-16 00:04:43.489 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2024-03-16 00:04:43.489 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 2428 entries during backup

So I'm still stumped. I don't can't find anything that suggests the integration is involved with causing the connectivity problems or HA to start to shutdown. And no evidence the integration doesn't fully clean up when that shutdown occurs. And no evidence it's causing the shutdown to not complete.

Closing this as resolved. Please open a new issue (with debug logs) if you still hit any further instability to help me keep repros prior to this fix vs after.