vectronic / homebridge-nut

Homebridge plugin for NUT (Network UPS Tools) Client

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nut client disconnecting and re-connecting every minute (after WLAN reconnects)

valeriansaliou opened this issue · comments

Hello,

First of all, thanks for your Homebridge plugin.
I'm running Homebridge latest on a RPi Zero W (connected over Wi-Fi) + homebridge-nut fetching UPSd data every 15 seconds from another LAN IP, which is wired over ethernet.

Once every two weeks, my Wi-Fi AP restarts (due to an AP firmware issue out of my control that wont be fixed), and thus all Wi-Fi devices lose connectivity for ~30s or so, until they reconnect. The main LAN router does not restart, so all wired connections are still up.

When the Wi-Fi reconnection happens, my Homebridge starts reporting every minute the following logs:

image

Again, and again, and again. If I restart the Homebridge service from the Homebridge admin UI, then the homebridge-nut plugin starts up and stops flooding the logs w/ reconnection logs every minute... until the next Wi-Fi reconnection happens a few days/weeks later, and this resumes.

This is my homebridge-nut config:

{
    "name": "Nut",
    "host": "10.0.1.5",
    "port": 3493,
    "low_batt_threshold": 40,
    "poll_interval": 15,
    "connect_interval": 5,
    "command_interval": 2,
    "platform": "Nut"
}

Would there be any issue in this plugin, or in the underlying nut client library used?

Note that if I put my UPS power down, homebridge-nut correctly reports it as OB even if it is actively flooding the logs, so this doesn't break the functionality of the plugin, it only floods the logs in a weird way.

Thanks!

Valerian.

Hmm interesting, give me some time and will check it out on the weekend...

Hi there,

Looking into this I can't really explain it. The actual nut client is pretty simple and is really just a wrapper around a NodeJS socket instance.

I also can't see where the interval of approx 55s between close events comes from...

One thought, are you looking at the home bridge error logs as well? These tend to be logged out separately and maybe you are getting some error there which could explain things.

Regardless, I have done a change (try v2.4.2) which completely discards the nut client and creates a new one when it connects. This may at least discard the old client when the issue occurs and the new one may no longer have the issue. Let me know if it helps...

@vectronic thanks, I'm updating the plugin right now. I'll keep you posted if this happens again.

Regarding the Homebridge error logs in /var/lib/homebridge/homebridge.log, they show exactly the same nut-related log lines as the Homebridge Web UI logs.

Okay, just had a Wi-Fi disconnection/reconnection and now this is happening again on the update you provided.

Logs:

[26/02/2022, 21:52:13] [Nut] nut client connected, reported devices: eaton=Eaton 3S 550
[26/02/2022, 21:53:12] [Nut] nutClose(hadError: undefined)
[26/02/2022, 21:53:12] [Nut] nutClose(hadError: undefined)
[26/02/2022, 21:53:16] [Nut] creating nut client for 10.0.1.5:3493
[26/02/2022, 21:53:16] [Nut] starting nut client for 10.0.1.5:3493
[26/02/2022, 21:53:18] [Nut] nut client connected, reported devices: eaton=Eaton 3S 550
[26/02/2022, 21:54:17] [Nut] nutClose(hadError: undefined)
[26/02/2022, 21:54:21] [Nut] creating nut client for 10.0.1.5:3493
[26/02/2022, 21:54:21] [Nut] starting nut client for 10.0.1.5:3493
[26/02/2022, 21:54:23] [Nut] nut client connected, reported devices: eaton=Eaton 3S 550

Following.

I have the same problem:

[02/03/2022, 15:17:56] [Nut] nutClose(hadError: undefined)
[02/03/2022, 15:17:58] [Nut] creating nut client for 192.168.2.96:3493
[02/03/2022, 15:17:58] [Nut] starting nut client for 192.168.2.96:3493
[02/03/2022, 15:18:01] [Nut] nut client connected, reported devices: APC1500=Description unavailable
[02/03/2022, 15:18:56] [Nut] nutClose(hadError: undefined)
[02/03/2022, 15:18:58] [Nut] creating nut client for 192.168.2.96:3493
[02/03/2022, 15:18:58] [Nut] starting nut client for 192.168.2.96:3493
[02/03/2022, 15:19:01] [Nut] nut client connected, reported devices: APC1500=Description unavailable
[02/03/2022, 15:19:56] [Nut] nutClose(hadError: undefined)
[02/03/2022, 15:19:58] [Nut] creating nut client for 192.168.2.96:3493
[02/03/2022, 15:19:58] [Nut] starting nut client for 192.168.2.96:3493
[02/03/2022, 15:20:01] [Nut] nut client connected, reported devices: APC1500=Description unavailable

Ok - thanks for the report.

I will look further when I get some time in the next week...

I've published a new version. If I understand my own code, there may have been a scenario where continually connection attempts occur. No promises whether this fixes the issue, but it's worth trying...

I guess you understand your code! I updated the plugin and all seems well. It solved it for me. Thanks!

Thanks, updated and will keep you posted about any further issue / or fixed issue.

It may be me, or rather my setup, but I still get the re-connections:

[05/03/2022, 13:16:49] [Nut] nutClose()
[05/03/2022, 13:16:51] [Nut] creating nut client for 192.168.2.96:3493
[05/03/2022, 13:16:51] [Nut] starting nut client for 192.168.2.96:3493
[05/03/2022, 13:16:54] [Nut] nut client connected, reported devices: APC1500=Description unavailable
[05/03/2022, 13:17:54] [Nut] nutClose()
[05/03/2022, 13:17:56] [Nut] creating nut client for 192.168.2.96:3493
[05/03/2022, 13:17:56] [Nut] starting nut client for 192.168.2.96:3493
[05/03/2022, 13:17:59] [Nut] nut client connected, reported devices: APC1500=Description unavailable
[05/03/2022, 13:18:59] [Nut] nutClose()
[05/03/2022, 13:19:01] [Nut] creating nut client for 192.168.2.96:3493
[05/03/2022, 13:19:01] [Nut] starting nut client for 192.168.2.96:3493
[05/03/2022, 13:19:04] [Nut] nut client connected, reported devices: APC1500=Description unavailable
[05/03/2022, 13:20:04] [Nut] nutClose()

I will have a look in my setup and see what I can improve and do some digging.

It seems the client socket connection to the nut server works ok, then the socket connection is closed 60 seconds later.

And I think I have found the culprit on the UPSD server side here:

https://github.com/networkupstools/nut/blob/master/server/upsd.c#L1031

If you could tell me your configured polling interval (which I suspect is greater than 60 seconds) then this would confirm the scenario. You could try changing your polling interval to 30 seconds and see if the problem disappears.

If this works, I might need to create an internal ping every 30 seconds to prevent the server from kicking us off

OR

Just handle being kicked off - does being disconnected every minute cause issues for the HomeKit accessory? Or does it just look odd in the logs?

My polling interval is exactly 60 seconds. I believe this is the default when one installs the plugin.
And believe you are right, it seems to be something on the server side. When I restart the the server the disconnection stops.
I kinda suspected that since I have restarted the server and for a while I saw no mentions in the homebridge log. This is what I hinted at when I wrote earlier that I need to check my setup. I thought I was doing something wrong.
It is worth mentioning that I have the upsd server on another pi, not on the one where my homebridge runs.
And indeed, I don't see any adverse effects other than the pollution of the homebridge logs.
If I get the disconnection logs again I will play with the polling interval. It is weird that I only get it sporadically, not constantly.

Thanks for the info. I am thinking I might just reduce the plug-in logging to debug instead of info as everything is behaving as it should. Alternatively it might be neater if I disconnect each polling cycle rather than leave the server to close the connection. Will have a think…

Mine is 15 second (in homebridge-nut), with the actual UPS being polled by upsd/nut every 10s over USB. I've not seen any disconnection since the last update, but the WiFi didn't cut off either, judging from the syslog.

I have reduced the logging level to avoid polluting the logs. I will close this issue as I believe the disconnect every minute is now an explained behaviour AND the original issue doesn't seem to be occurring again (at least I haven't heard anything).

I don't really want to modify the plugin to automatically disconnect after polling each time as there is a chance I will introduce new issues.

If the issue persists etc. let me know.