Z-Wave-Me / home-automation

Z-Way Home Automation engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sockets.tcp() fails to callback .onclose() on socket close by remote ... and other issues!

ralphwetzel opened this issue · comments

Dear all!

I was creating a UserModule to receive & display data from a server via sockets.tcp(). I finally made it running and was quite satisfied - until z-way-server failed to respond some hours later.
Investigating the obvious cause was not difficult: Debian ran out of sockets and was unable to open a new one.
It yet concerned me more that the reason for this seemed to be within my UserModule: The log told me, that the .onclose() callback was never triggered, all sockets therefore still open! ss command issued showed a huge number of connections in state CLOSE-WAIT.

Google revealed a very old entry in forum.z-wave.me describing the same issue - yet the thread drained without solution.

For further investigation I wrote a UserModule that provided a vDev with a button.

image

on button press, a listening server & and a client are created:

The client connects to the server, sends a message, the server receives the message, sends it back & then closes the connection. As correct behavior the client should receive the message first, then the close signal & thus close and free the socket:

OnCloseTester.prototype.test = function () {

    var self = this;
    var port_inc = 1;

    self.reconnect = false;

    if (self.client !== undefined) {
        debugPrint("*** OnCloseTester | Client: Forcing close()");
        self.client.close();
        self.client = undefined;
    }

    if (self.server !== undefined) {
        debugPrint("*** OnCloseTester | Server: Forcing close()");
        self.reconnect = true;
        self.server.close();
        self.server = undefined;
        if (self.increment === true) {
            self.port += port_inc;
        }
        debugPrint("*** OnCloseTester | Server: onclose() will restart with port " + self.port + " ...");
        return;
    }

    var server = new sockets.tcp();
    self.server = server;

    server.reusable();
    debugPrint("*** OnCloseTester | Server: bind() to port " + self.port + " ...");
    server.bind(self.ip, self.port);

    server.onconnect = function(remoteHost,remotePort,localHost,localPort) {
        debugPrint("*** OnCloseTester | Server: Client connection from remote port " + remotePort + ".");
    };

    server.onrecv = function(data) {
        debugPrint("*** OnCloseTester | Server: Received data...");
        var msg = String.fromCharCode.apply(null, new Uint8Array(data));
        debugPrint("*** OnCloseTester | Server: Echoing '" + msg + "'");
        this.send(msg);
        this.close();
        debugPrint("*** OnCloseTester | Server: Closed connection.");
    };

    server.onclose = function() {
        debugPrint("*** OnCloseTester | Server: socket.onclose() triggered!");
        debugPrint("*** OnCloseTester | Server: Restarting? => " + self.reconnect);
        if (self.reconnect === true) {
            debugPrint("*** OnCloseTester | Server: Restarting in 1000ms...");
            setTimeout(_.bind(self.test, self), 1000);
        }
    };

    debugPrint("*** OnCloseTester | Server: Listening @ " + self.port + " ...");
    server.listen();

    var client = new sockets.tcp();

    client.onconnect = function(remoteHost,remotePort,localHost,localPort) {
        debugPrint("*** OnCloseTester | Client: Connected using local port " + localPort);
        var msg = "This is the TEST message!";
        debugPrint("*** OnCloseTester | Client: Sending '" + msg + "'...");
        debugPrint("*** OnCloseTester | Client: Sending success: '" + this.send(msg) + "'");
    };

    client.onrecv = function(data) {
        var msg = String.fromCharCode.apply(null, new Uint8Array(data));
        debugPrint("*** OnCloseTester | Client: Received echo: '" + msg + "'");
    };

    client.onclose = function() {
        debugPrint("*** OnCloseTester | Client: socket.onclose() triggered!");
        this.close();
    };

    debugPrint("*** OnCloseTester | Client: Connecting to " + self.ip + ":" + self.port);
    client.connect(self.ip, self.port);

    self.client = client;

};

This is what happens:

[2018-06-16 17:17:00.253] [I] [core] ---  OnCloseTester24 performCommand processing: {"0":"on","1":{}}
[2018-06-16 17:17:00.257] [I] [core] *** OnCloseTester | Server: bind() to port 5555 ...
[2018-06-16 17:17:00.257] [I] [core] *** OnCloseTester | Server: Listening @ 5555 ...
[2018-06-16 17:17:00.258] [I] [core] *** OnCloseTester | Client: Connecting to 127.0.0.1:5555
[2018-06-16 17:17:00.272] [I] [core] *** OnCloseTester | Server: Client connection from remote port 58254.
[2018-06-16 17:17:00.273] [I] [core] *** OnCloseTester | Client: Connected using local port 58254
[2018-06-16 17:17:00.273] [I] [core] *** OnCloseTester | Client: Sending 'This is the TEST message!'...
[2018-06-16 17:17:00.273] [I] [core] *** OnCloseTester | Client: Sending success: 'true'
[2018-06-16 17:17:00.294] [I] [core] *** OnCloseTester | Server: Client connection from remote port 58254.
[2018-06-16 17:17:00.296] [I] [core] *** OnCloseTester | Server: Received data...
[2018-06-16 17:17:00.297] [I] [core] *** OnCloseTester | Server: Echoing 'This is the TEST message!'
[2018-06-16 17:17:00.298] [I] [core] *** OnCloseTester | Server: Closed connection.
[2018-06-16 17:17:00.319] [I] [core] *** OnCloseTester | Client: Received echo: 'This is the TEST message!'
[2018-06-16 17:17:00.321] [I] [core] *** OnCloseTester | Server: socket.onclose() triggered!
[2018-06-16 17:17:00.321] [I] [core] *** OnCloseTester | Server: Restarting? => false

As you can see, client.onclose() was not triggered. Does ss confirm?

$ ss
[...]
tcp    CLOSE-WAIT 0      0        127.0.0.1:58254        127.0.0.1:5555
[...]

Yes, it does. Port 58254 is still in status CLOSE-WAIT - which indicates that the local application hasn't closed the socket yet.

That's bad - yet perhaps just coincidence? Let's press on once again. Here's what happens:

[2018-06-16 17:23:27.830] [I] [core] ---  OnCloseTester24 performCommand processing: {"0":"on","1":{}}
[2018-06-16 17:23:27.833] [I] [core] *** OnCloseTester | Client: Forcing close()
[2018-06-16 17:23:27.833] [I] [core] *** OnCloseTester | Server: Forcing close()
[2018-06-16 17:23:27.834] [I] [core] *** OnCloseTester | Server: onclose() will restart with port 5555 ...
[2018-06-16 17:23:27.836] [I] [core] *** OnCloseTester | Client: socket.onclose() triggered!
[2018-06-16 17:23:27.837] [I] [core] *** OnCloseTester | Server: socket.onclose() triggered!
[2018-06-16 17:23:27.837] [I] [core] *** OnCloseTester | Server: Restarting? => true
[2018-06-16 17:23:27.838] [I] [core] *** OnCloseTester | Server: Restarting in 1000ms...
[2018-06-16 17:23:28.844] [I] [core] *** OnCloseTester | Server: bind() to port 5555 ...
[2018-06-16 17:23:28.844] [I] [core] *** OnCloseTester | Server: Listening @ 5555 ...
[2018-06-16 17:23:28.845] [I] [core] *** OnCloseTester | Client: Connecting to 127.0.0.1:5555
[2018-06-16 17:23:28.867] [I] [core] *** OnCloseTester | Server: Client connection from remote port 58256.
[2018-06-16 17:23:28.868] [I] [core] *** OnCloseTester | Client: Connected using local port 58256
[2018-06-16 17:23:28.868] [I] [core] *** OnCloseTester | Client: Sending 'This is the TEST message!'...

At this stage z-way-server seems to hang - as the UI states, after some seconds, that it has lost connection...

image

Summary: sockets has some issues - based on my investigations.
It would be great if you see the demand to fix that in a timely manner - or either show me the mistakes I did in the application of sockets.

BR, Ralph

Thanks, just fixed it. Will be part of next release candidate

Thank you for your fast reaction!
Testing with the latest rc (18.06.18 / 01:17) I can confirm that the .onclose() callback is triggered now:

[2018-06-19 21:11:33.986] [I] [core] ---  OnCloseTester24 performCommand processing: {"0":"on","1":{}}
[2018-06-19 21:11:33.990] [I] [core] *** OnCloseTester | Server: bind() to port 5555 ...
[2018-06-19 21:11:33.990] [I] [core] *** OnCloseTester | Server: Listening @ 5555 ...
[2018-06-19 21:11:33.991] [I] [core] *** OnCloseTester | Client: Connecting to 127.0.0.1:5555
[2018-06-19 21:11:34.004] [I] [core] *** OnCloseTester | Server: Client connection from remote port 54646.
[2018-06-19 21:11:34.005] [I] [core] *** OnCloseTester | Client: Connected using local port 54646
[2018-06-19 21:11:34.005] [I] [core] *** OnCloseTester | Client: Sending 'This is the TEST message!'...
[2018-06-19 21:11:34.006] [I] [core] *** OnCloseTester | Client: Sending success: 'true'
[2018-06-19 21:11:34.028] [I] [core] *** OnCloseTester | Server: Client connection from remote port 54646.
[2018-06-19 21:11:34.030] [I] [core] *** OnCloseTester | Server: Received data...
[2018-06-19 21:11:34.032] [I] [core] *** OnCloseTester | Server: Echoing 'This is the TEST message!'
[2018-06-19 21:11:34.033] [I] [core] *** OnCloseTester | Server: Closed connection.
[2018-06-19 21:11:34.062] [I] [core] *** OnCloseTester | Client: Received echo: 'This is the TEST message!'
[2018-06-19 21:11:34.066] [I] [core] *** OnCloseTester | Server: socket.onclose() triggered!
[2018-06-19 21:11:34.066] [I] [core] *** OnCloseTester | Server: Restarting? => false
[2018-06-19 21:11:35.063] [I] [core] *** OnCloseTester | Client: socket.onclose() triggered!

That's definitely a great progress!
There's yet still the issue, that another run forces the server to hang:

[2018-06-19 21:17:52.082] [I] [core] ---  OnCloseTester24 performCommand processing: {"0":"on","1":{}}
[2018-06-19 21:17:52.083] [I] [core] *** OnCloseTester | Client: Forcing close()
[2018-06-19 21:17:52.083] [I] [core] *** OnCloseTester | Server: Forcing close()
[2018-06-19 21:17:52.084] [I] [core] *** OnCloseTester | Server: onclose() will restart with port 5555 ...
[2018-06-19 21:17:52.085] [I] [core] *** OnCloseTester | Server: socket.onclose() triggered!
[2018-06-19 21:17:52.085] [I] [core] *** OnCloseTester | Server: Restarting? => true
[2018-06-19 21:17:52.086] [I] [core] *** OnCloseTester | Server: Restarting in 1000ms...
[2018-06-19 21:17:53.097] [I] [core] *** OnCloseTester | Server: bind() to port 5555 ...
[2018-06-19 21:17:53.098] [I] [core] *** OnCloseTester | Server: Listening @ 5555 ...
[2018-06-19 21:17:53.098] [I] [core] *** OnCloseTester | Client: Connecting to 127.0.0.1:5555
[2018-06-19 21:17:53.120] [I] [core] *** OnCloseTester | Server: Client connection from remote port 54648.
[2018-06-19 21:17:53.121] [I] [core] *** OnCloseTester | Client: Connected using local port 54648
[2018-06-19 21:17:53.121] [I] [core] *** OnCloseTester | Client: Sending 'This is the TEST message!'...

According to ss the socket is ESTABlished:

tcp   ESTAB      0      0        127.0.0.1:5555        127.0.0.1:54648  

The log still fills with debug messages, yet the network interface doesn't respond anymore.

I guess, there's an error happening (in .send()?), that impacts the sockets yet is not propagated back to the JS interface. Could you please doublecheck?

Can not reproduce it on Ubuntu. Are you on RPi?

Yes, I am.

Can confirm the issue still exists with the RPi 2.3.8-rc5 build (found via the referenced MQTT plugin bug).

I see this issue as well on 2.3.7. Let me know if I can help out with debugging.

I can report that rc6 plus the latest pulls in the referenced MQTT plugin bug cleared up my issues and I'm able to use everything normally. Thanks!

@ralphwetzel checked this again. Your code hangs because your est code is sending packet to the same engine via TCP. So one part of your code runs Send and locks mutex until the answer is received. Another part is trying to lock the same mutex to execute receive callback. So this makes a classical deadlock

@PoltoS Thank you for this explanation. I'm yet unsure if this describes what's happening - especially as it's working propperly in the first run, but fails when restarted. If your scenario fits, it should fail in both cases - according to my understanding.

Those are race conditions and in rare cases it still works even the second time.

I close this as not relevant. Test case is very synthetic. If you can reproduce it on a more realistic test, please re-open