Entire Modbus hub hangs if one of slave devices stops responding

Question

Entire Modbus hub hangs if one of slave devices stops responding

FrenkK opened this issue 2 years ago · comments

The problem

If a single slave device on a Modbus hub stops working, all Modbus sensors for that hub stop updating until the offending slave device starts working again. If there are multiple hubs, the other ones still seem to work.
It seems the same problem can prevent Home Assistant startup from finishing as well, but I did not research that further.
This is a new problem, the same setup (except for the new data types) worked before the latest batch of Modbus changes.

What version of Home Assistant Core has the issue?

2022.5.5, at least since 2022.4.7

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Modbus

Link to integration documentation on our website

https://www.home-assistant.io/integrations/modbus/

Diagnostics information

home-assistant.log
This is the log with debug logging enabled.
This is a redacted log, I deleted everything until Homeassistant startup completed, no errors until that. I also deleted the entries for upnp, ssdp and some other unrelated functions since the original log file was over 2 megabytes in size.

Example YAML snippet

# Loads default set of integrations. Do not remove.
default_config:

# Text to speech
#tts:
#  - platform: google_translate

automation: !include automations.yaml
script: !include scripts.yaml
scene: !include scenes.yaml

logger:
  default: debug
  logs:
    homeassistant.components.modbus: debug
    pymodbus.*: debug
    
modbus:
  - name: lopa
    type: tcp
    host: 192.168.1.7
    port: 502
    timeout: 14
    delay: 1
    close_comm_on_error: false
    retries: 10
    retry_on_empty: true
    sensors:
    - name: Faktor moci iz omrezja
      slave: 2
      input_type: input
      address: 62
      count: 2
      data_type: float32
      unit_of_measurement: /1
      precision: 2

    - name: Faktor moci v hiso
      slave: 3
      input_type: input
      address: 62
      count: 2
      data_type: float32
      unit_of_measurement: /1
      precision: 2

    - name: Napetost niza 1
      slave: 1
      input_type: holding
      address: 234
      count: 1
      data_type: int16
      scale: 0.1
      unit_of_measurement: V
      precision: 1

Anything in the logs that might be useful for us?

To put the log in context:
* Until 11:57:30 the system is working normally.
* At 11:57:30 the slave device 2 was turned off and stops responding until until 12:02:30. The connected sensor is named "Faktor moci iz omrezja". During this time, the other two sensors (from slave 1 and 3) stop updating - this is the problem I'm trying to solve.
* At 12:02:30 the slave device was turned on again and all the sensors start working again.

Additional information

All the info, config and logs are for a freshly installed system, the YAML snippet is the whole config.yaml file.

probot-home-assistant · Answer 1 · Thu May 26 2022 00:51:54 GMT+0800 (China Standard Time)

modbus documentation
modbus source
_{^{(message by IssueLinks)}}

probot-home-assistant · Answer 2 · Thu May 26 2022 00:51:57 GMT+0800 (China Standard Time)

Hey there @adamchengtkc, @janiversen, @vzahradnik, mind taking a look at this issue as it has been labeled with an integration (modbus) you are listed as a code owner for? Thanks!
_{^{(message by CodeOwnersMention)}}

jan iversen · Answer 3 · Thu May 26 2022 03:10:17 GMT+0800 (China Standard Time)

Sounds like a problem, I will try to reproduce it with the test suite.

Franci Kopač · Answer 4 · Sun Jun 05 2022 21:31:48 GMT+0800 (China Standard Time)

@janiversen, did you have any luck with this?
It's summer and I'm often losing power due to thunderstorms. This bug prevents me from detecting that reliably and preventing draining the battery deeply ...

Franci Kopač · Answer 5 · Thu Jun 16 2022 02:11:35 GMT+0800 (China Standard Time)

So I found a partial solution.
The problem seems to be connected with some old settings that used to be needed for reliability but seem to cause problems with the new implementation.
Specifically, I deleted the following lines from my modbus hub config (it's likely that not all of them were causing a problem):

timeout: 14  
delay: 1  
close_comm_on_error: false  
retries: 10  
retry_on_empty: true

My theory is that the retries were preventing the whole thing from reading out the data.
The setup seems to work acceptably now, but the startup of HA is still really slow if one of the Modbus devices is not working, so I think there is still a problem there that needs fixing.