tobof / openhab-addons

The next-generation open Home Automation Bus (openHAB)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Binding not reconnecting to ethernet gateway

FredericMa opened this issue · comments

Expected Behavior

If the gateway goes offline it should automatically reconnect when it is back online.

Current Behavior

If I want to program my arduino with new software it gets disconnected from the network (obvious) but when I reconnect it to the network, the MySensors binding doesn't initiate a reconnect to the gateway.
It looks like it doesn't even know that the gateway went offline since its status remains online.
This is the configuration of the gateway:
image

Note: all my sensors are connected to the gateway itself but I don't think this could cause the issue.

Greetings,
Frederic

@FredericMa thanks for reporting the issue here on GitHub. This problem seems related to the NetworkSanityChecker that I've implemented last year. Personally I've never experienced issues with my serial gateway probably this behaviour affects only Ethernet gateway. I'll build one of it an try to reproduce your problem 👍

Found one possible cause of this issue. As I see in the configuration you have attached, Disable network sanity checker is not checked (so it is false). Unfortunately inside thing-types.xml that parameter is defined as:

<parameter name="networkSanCheckEnabled" type="boolean" required="false">
<label>Disable network sanity check</label>
<description>Network sanity check periodically ensure that gateway is up and running</description>
<default>false</default>
</parameter>

You see that its name is networkSanCheckEnabled but the description say something different. So I think that the problem is coming from an ambiguity. To be sure I look for its usage in the code and I found this:

/**
     * Startup the gateway
     */
    public void startup() {

        myCon.initialize();

        myEventRegister.addEventListener(this);
        
        if (myConf.getEnableNetworkSanCheck()) { <- HERE THE AMBIGUITY
            myNetSanCheck = new MySensorsNetworkSanityChecker(this, myEventRegister, myCon);
        }
    }

Temporary solution for @FredericMa is to enable the Disable network sanity checker and see if something changes. Keep in mind that NetworkSanityChecker (in your configuration) start every 1 minute and wait for 2 attempts before disconnection and try to re-connecting.

About the concrete solution we have to wait @tobof opinion.

Hi @andreacioni,

Thanks for the suggestion!
I can confirm that this fixed the problem in my case!

Thanks for the quick response!

Greetings,
Frederic

Hi @andreacioni, I've tried to set NetworSanityChecker to "false" in my .things file and nothing happened. Sanity Checker stopps but gateway did not reconnect after disconnecting and connecting network cable again.

Hi @nofoxgit. Thanks for your test.
Have you wait all attempts of the NetworkSanityChecker (accordingly to your gateway configuration inside .things file)? I ask you that because disconnections are not immediately recognized by the binding. So you have to wait since all remaining attempts expires.
Could you please,also post the log (DEBUG level) of your situation?

Ok, I have networkSanCheckConnectionFailAttempts set to 100, it seams that I missunderstood what the author have on his mind ? Now I set it to 2. After disconnecting network cable sanityChecker check for gateway 2 times and than i have this messages:

2017-08-22 16:52:00.538 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:00.544 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:00.547 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Failed connecting to bridge...next retry in 10 seconds (Retry No.:1) 2017-08-22 16:52:00.549 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Disconnecting from IP bridge ... 2017-08-22 16:52:00.552 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:00.560 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:10.570 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Connecting to IP bridge [192.168.0.71:5003] 2017-08-22 16:52:13.732 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Error while trying to connect InputStreamReader java.net.NoRouteToHostException: No route to host (Host unreachable) at java.net.PlainSocketImpl.socketConnect(Native Method)[:1.8.0_121] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)[:1.8.0_121] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)[:1.8.0_121] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)[:1.8.0_121] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)[:1.8.0_121] at java.net.Socket.connect(Socket.java:589)[:1.8.0_121] at java.net.Socket.connect(Socket.java:538)[:1.8.0_121] at java.net.Socket.<init>(Socket.java:434)[:1.8.0_121] at java.net.Socket.<init>(Socket.java:211)[:1.8.0_121] at org.openhab.binding.mysensors.internal.protocol.ip.MySensorsIpConnection.establishConnection(MySensorsIpConnection.java:46)[181:org.openhab.binding.mysensors:2.2.0.201707241115] at org.openhab.binding.mysensors.internal.protocol.MySensorsAbstractConnection.connect(MySensorsAbstractConnection.java:150)[181:org.openhab.binding.mysensors:2.2.0.201707241115] at org.openhab.binding.mysensors.internal.protocol.MySensorsAbstractConnection.run(MySensorsAbstractConnection.java:127)[181:org.openhab.binding.mysensors:2.2.0.201707241115] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_121] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)[:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)[:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)[:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_121] at java.lang.Thread.run(Thread.java:745)[:1.8.0_121] 2017-08-22 16:52:13.738 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:13.742 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:13.744 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Failed connecting to bridge...next retry in 10 seconds (Retry No.:2) 2017-08-22 16:52:13.746 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Disconnecting from IP bridge ... 2017-08-22 16:52:13.747 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:13.752 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:23.755 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Connecting to IP bridge [192.168.0.71:5003] 2017-08-22 16:52:26.852 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Error while trying to connect InputStreamReader java.net.NoRouteToHostException: No route to host (Host unreachable)

Every 10 seconds Binding tries to reconnect with the bridge and when I connect network cable it reconnects!! :)

I have my gateway configured in the .things file and I have networkSanCheckEnabled="true" there. When I have this configured by PaperUI discovery function, SanityCheck didn't work in both states of DisableNetworkSanityCheck switch!
And now the most funny part! As I wrote earlier I have networkSanCheckEnabled="true" in my configuration file. And I have checked what is the state of DisableNetworkSanityCheck switch.. Guess what ? It is set to ON! So according to PaperUI now I have sanityCheck disabled but according to .thing sfile I have it enabled.

So my solution is to configure OpenHab2 binding with the .things file and set SanityCheck for true and not much amount of retries. I.e.
Bridge mysensors:bridge-eth:gateway [ ipAddress="192.168.0.71", tcpPort=5003, sendDelay=200, networkSanCheckEnabled=true, networkSanCheckInterval=2, networkSanCheckConnectionFailAttempts=2 ]

It seams that the biding must first set the bridge to OFFLINE state and than it attempts to reconnect the gateway.

commented

I've changed this configuration parameter to remove the double negative. Looks like I've forgotten to change the label. networkSanCheckEnabled=true enables the network sanity check. :)

Additionally:

@andreacioni : Have you played with setKeepAlive(true) on the Socket? Maybe this also would improve the chance to detect a lost connection.

commented

@andreacioni
Does not really belong to this issue, but while looking at setKeepAlive(true) I found setTcpNoDelay(true) for small content. Maybe we're able to tune the communication with that parameter.

Hi @tobof, I took a look inside documentation of Socket class in Java 8. Unfortunately I' think your suggested options doesn't help us to better detect disconnections 😞

  • setKeepAlive(true): is a system dependent feature that rely on TCP stack implementation of both gateway and controller. On some Linux distribution keep-alive is sent every 2 hours...so probably we can avoid using it 😉

  • setTcpNoDelay(true): is used to reduce network latency while sending TCP packet. In our situation I think is not critical the network throughput.

Maybe I don't understand why you're suggesting them. Did you have an idea on how to combine them to achieve a faster feedback on connection failure?

commented

Hi @andreacioni !

  • setKeepAlive(true): okay, 2 hours is of no help here. I've just looked at the parameter at my server and it was set to 75 seconds.

  • setTcpNoDelay(true): As I've written, this suggestion does not belong to this issue, but was an idea I've stumbled about while looking at the documentation of Socket.

The primary and best way to detect a lost connection is the sanity checker. Maybe the interval should be changed to seconds instead of minutes and with that would allow a heartbeat every 20s or so?

Yes, I had the same idea! But, we have a backward compatibility problem. If we change this settings now (and we maintain the same parameter name in things-type.xml) users that set 1 minute in current binding will have the NetworkSanityChecker that starts every seconds... 😨

In my opinion the benefit of changing minutes to seconds covers the risk (especially now that the binding is going to be a part of the official OH distribution).

Another solution is to add a parameter in things-type.xml to allow the user to select the time unit, but not sure it is the best way.

I am seeing similar issues using the serial gateway with OH2. When OH2 restarts, it never queries the gateway for the current sensors states. Is the intent of the heartbeat and/or network sanity check to periodically request node and sensor states in case of restart?

commented

@jechiasson Requesting the current state of a sensor is only possible, if the node is programmed that way, which is not default in MySensors. So the binding will not request status but will listen for updates.

@andreacioni An additional idea: add an (for a while) additional parameter networkSanCheckIntervalSec and mark the old one deprecated. We may then use some logic on initialization of the gateway like:

if networkSanCheckInterval is set (not null) => use it
otherwise check for networkSanCheckIntervalSec
if set => use it (in seconds)
otherwise => use default

Some time in the feature we may remove networkSanCheckInterval and the addiational logic.

@tobof If the default is changed will the OH2 binding then do the query on startup? What about the nodes - do they send their child states on startup to the gateway?

I cycled the SanityCheck off, and captured the log. It appears that the locally attached sensors do send their state:
==> /var/log/openhab2/events.log <==
2017-10-19 12:13:15.198 [ThingUpdatedEvent ] - Thing 'mysensors:bridge-ser:51656f7d' has been updated.

==> /var/log/openhab2/openhab.log <==
2017-10-19 12:13:15.212 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Cache file: given_ids.cached exist.
2017-10-19 12:13:15.217 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Cache (given_ids) content: [0]
2017-10-19 12:13:15.219 [DEBUG] [col.serial.MySensorsSerialConnection] - Enabling connection watchdog
2017-10-19 12:13:15.222 [DEBUG] [col.serial.MySensorsSerialConnection] - Connecting to /dev/ttyArduino [baudRate:115200]
2017-10-19 12:13:15.226 [DEBUG] [org.openhab.binding.mysensors ] - ServiceEvent REGISTERED - {org.eclipse.smarthome.config.discovery.DiscoveryService}={service.id=303, service.bundleid=186, service.scope=singleton} - org.openhab.binding.mysensors
2017-10-19 12:13:15.230 [DEBUG] [.discovery.MySensorsDiscoveryService] - Starting MySensors discovery scan
2017-10-19 12:13:15.230 [DEBUG] [nsors.handler.MySensorsBridgeHandler] - Initialization of the MySensors bridge DONE!
2017-10-19 12:13:15.236 [DEBUG] [col.serial.MySensorsSerialConnection] - Final port list: /dev/ttyArduino
2017-10-19 12:13:15.255 [DEBUG] [col.serial.MySensorsSerialConnection] - Successfully connected to serial port.
2017-10-19 12:13:15.255 [DEBUG] [col.serial.MySensorsSerialConnection] - Waiting 3 seconds to allow correct reset trigger on serial connection opening
2017-10-19 12:13:18.257 [DEBUG] [rsAbstractConnection$MySensorsWriter] - Sending to MySensors: 0;0;3;0;2;
2017-10-19 12:13:18.257 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:BGN:INIT GW,CP=R-NGA--,VER=2.1.1
2017-10-19 12:13:18.258 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;14;Gateway startup complete.
2017-10-19 12:13:18.259 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;0;0;17;2.1.1
2017-10-19 12:13:18.259 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Presentation Message received
2017-10-19 12:13:18.261 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0]
2017-10-19 12:13:18.262 [WARN ] [.discovery.MySensorsDiscoveryService] - Cannot automatic discover thing node: 0, child: 255 please insert it manually
2017-10-19 12:13:18.262 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;1;0;0;0;
2017-10-19 12:13:18.262 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Presentation Message received
2017-10-19 12:13:18.262 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0]
2017-10-19 12:13:18.262 [DEBUG] [.discovery.MySensorsDiscoveryService] - Preparing new thing for inbox: mysensors:door
2017-10-19 12:13:18.265 [DEBUG] [.discovery.MySensorsDiscoveryService] - Discovered device submitted
2017-10-19 12:13:18.265 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;3;0;0;0;
2017-10-19 12:13:18.265 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Presentation Message received
2017-10-19 12:13:18.265 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0]
2017-10-19 12:13:18.266 [DEBUG] [.discovery.MySensorsDiscoveryService] - Preparing new thing for inbox: mysensors:door
2017-10-19 12:13:18.266 [DEBUG] [.discovery.MySensorsDiscoveryService] - Discovered device submitted
2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:BGN:STP
2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:REG:NOT NEEDED
2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:BGN:INIT OK,TSP=NA
2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;1;1;0;16;0
2017-10-19 12:13:18.266 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Node 0 found in gateway
2017-10-19 12:13:18.266 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Child 1 found in node 0
2017-10-19 12:13:18.268 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;3;1;0;16;1
2017-10-19 12:13:18.268 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Node 0 found in gateway
2017-10-19 12:13:18.268 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Child 3 found in node 0
2017-10-19 12:13:18.271 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;2;2.1.1
2017-10-19 12:13:18.272 [DEBUG] [rsAbstractConnection$MySensorsReader] - Good,Gateway is up and running! (Ver:2.1.1)
2017-10-19 12:13:18.272 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0]
2017-10-19 12:13:18.273 [INFO ] [col.serial.MySensorsSerialConnection] - Successfully connected to MySensors Bridge.

Are these the initial state messages?
2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;1;1;0;16;0
2017-10-19 12:13:18.268 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;3;1;0;16;1