lipp / lua-websockets

Websockets for Lua.

Home Page:http://lipp.github.com/lua-websockets/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

websocket does not close when wlan/lan is disconnected.

jjvbsag opened this issue · comments

Hi. We are trying out websockets on a raspi and udoo boards. Most things work fine. But:

Giving the following setup

[Tablet] <=> WLAN <=> [Udoo-as-WLAN-Accesspoint] running a ws-server using server_ev.lua. Our server produces a data stream, which is send over the socket in a regular schedule.

Now if the browser on the tablet opens a webpage with js accessing the websocket => Data is flowing.
If I close the browser, then ws:on_close() is called and everything is fine.

But, (assuming the webpage is opened and data is flowing) when I disconnect the tablet from the WLAN, then the websocket lua script begins to eat more and more memory and cpu-time (uptil 100%). The same is true, if I use the LAN-Port and change the ip-address of the udoo, while a connection is active.

I have added some debug code:

:::
local ok=ws:send(state)
if type(ok)~="number" or ok~=0 then
    log("ws:send()=%s\n",vis(ok))
end
:::

and I get something like this (first number is timestamp HHMMSS)

051540:ws:send()=388
051540:ws:send()=832
051541:ws:send()=1246
051541:ws:send()=1660
051541:ws:send()=2074
051542:ws:send()=2488
051542:ws:send()=2902
:::

number is increasing until memory is completely used up

How do I catch this "connection lost" condition? Is it possible to set a timeout for the send? Or to set a limit of send data in queue?

could you make a small test using basic tcp? using luasocket...
i don't know if luasocket forwards loss of network interfaces in any way...

Did as requested.
Server script simple-server.lua:

local socket=require"socket"
local server=assert(socket.bind("*", 23456))
print("waiting...",server:getsockname())io.flush()
for client in function() return server:accept() end do
    print("conn:",client:getsockname())io.flush()
    client:settimeout(10)
    while true do
        local msg, err = client:receive()
        if err then
            print("err:"..err)io.flush()
            break
        end
        print("rx:"..msg)io.flush()
        msg=msg+1
        print("tx:"..msg)io.flush()
        client:send(msg.."\n")
    end
    client:close()
    print("waiting...",server:getsockname())
end

Client script simple-client.lua

local socket=require"socket"
local client=assert(socket.connect("192.168.43.1", 23456))
client:settimeout(10)
local msg=100000
while true do
    print("tx:"..msg)io.flush()
    client:send(msg.."\n")
    local line, err = client:receive()
    if err then
        print("err:"..err)io.flush()
        break
    end
    print("rx:"..line)io.flush()
    msg=msg+10
    socket.sleep(1)
end
client:close()

Starting server script on the udoo-board. Output.

waiting...      0.0.0.0 23456

Starting client script on a acer windows 10 tablet:

tx:100000
rx:100001
tx:100010
rx:100011
tx:100020
rx:100021
tx:100030
rx:100031
tx:100040
rx:100041
tx:100050
rx:100051
::::

Server says:

conn:   192.168.43.1    23456
rx:100000
tx:100001
rx:100010
tx:100011
rx:100020
tx:100021
rx:100030
tx:100031
rx:100040
tx:100041
rx:100050
tx:100051
rx:100060

Now, when I disconnect the WLAN connection (at the acer tablet),
client immediately stops with message

:::
rx:100111
tx:100120
rx:100121
tx:100130
err:closed

The server stalls for the timeout (10 seconds), an then says:

err:timeout
waiting...      0.0.0.0 23456

So, the only thing, the server lua script gets, is a timeout.

Then I change the server script to

    ::::
    client:settimeout()
    ::::

same test procedure, at disconnect the server stalls forever, no error message.

Then I change the server code to

    ::::
    client:settimeout()
    client:setoption("keepalive",true)
    ::::

same test procedure. After 2 hours (to be precise, after 7875 seconds) master gets the timeout.

err:timeout

This can be controlled via standard unix files

/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200

eg, writing 600 to /proc/sys/net/ipv4/tcp_keepalive_time reduces the timeout to 10 minutes.

After all this isn't luasocket specific. Writing a client in C using the socket interface would reveal the same results.

I don't know, how this would be done on windows systems ... and I don't even care 😁

After all I did a workaround (I wouldn't call it solution, but it's good enough for the moment).

I changed my code in the ws-server to

local ok=ws:send(state)
if type(ok)~="number" or ok~=0 then
    ws:close()
end

With this the connection gets closed (after some minor delay), when the send is incomplete.
It's not perfect, but it keeps memory and cpu usage clean.

Please review the consequences, if this is a good way to do it.

@jjvbsag thanks for the detailed results. i think your solution is good. dealing with unstable tcp connectins / network interfaces always is tough. if i understand correctly, you are handling this now in your "application logic" (and not by patching lua-websockets), is this correct?

@lipp Yes, I only changed the code in my ws-server script, the websocket library scripts themself are unchanged.