<feature> (Maybe) Send PING frames to identify inactive connections

Question

<feature> (Maybe) Send PING frames to identify inactive connections

Theldus opened this issue 2 years ago · comments

Description

Devices that are able to sleep or situations where there is an abrupt drop in connection (such as a weak WiFi signal) can cause the socket to not close immediately, even if the device is unable to receive messages from the server.

Situations like this make wsServer keep these devices in the clients list, even if they are virtually disconnected from the server, this causes:

ws_sendframe* try to send messages to them during broadcasts
send() to them returns as if the message had been sent
occupy limited slots of the client list and also other server resources

In addition to being inconvenient, this can easily run out of resources if a malicious user launches a Denial-of-Service attack.

In order to prevent this, wsServer can (optionally) support sending periodic PINGs frames with defined interval: if in <amount of time> there is no PONG response, abort the connection.

Since this belongs to the WebSockets specification, all clients are expected to support PING frames and respond to them, so there are no worries in that regard.

Implementation

Although it sounds simple, there are many ways to implement it.

My idea would be to unify the close and PING timeout in one place: a 'timeout thread', which would handle 'timeout events'.

The function of this thread would be to periodically wake up and check which pending 'events' it needs to handle: if a close timeout, check if the close timeout has elapsed and take the appropriate actions, if PING timeout, check if the timeout has elapsed and also take the appropriate actions.

This would greatly simplify the close process (since the current approach creates N timeout threads, one per client), support PING timeout, and pave the way for supporting a proper wsServer shutdown (issue #31).

Note 1: As this feature makes use of threads and etc, I'm planning on making it optional, enabled at compile time.
Note 2: I'm marking it as 'help wanted' because it's a more or less complicated to implement nicely, any help is welcome.

Dave Glover · Answer 1 · Wed Apr 27 2022 17:31:51 GMT+0800 (China Standard Time)

Hey @Theldus

I've had a go at implementing ping pong.

I forked your repo, and have had a go at implementing.

The idea is you have an incrementing ping id that is pong'd back then you compare. If no pong response then ping will get ahead of pong and you know to kill the session. Similar idea to MQTT Ping / Keep alive. The ping timing is controlled for the calling app - so it makes more platform independent...

New function call ws_ping
https://github.com/gloveboxes/wsServer/blob/043417a48b9adfe8835f011df5ec2e19b6ae348f/src/ws.c#L543

receive the pong msg
https://github.com/gloveboxes/wsServer/blob/ae49205d8207a328381ab8ca07183dc166105ed7/src/ws.c#L1224

init the initail client pong_id
https://github.com/gloveboxes/wsServer/blob/923c4d3445bd8a7d32e16b4e0673e6a66c3ed932/src/ws.c#L720

This is an app calling the api and determining if ws client silently disconnected.

https://github.com/gloveboxes/Altair8800.Emulator.UN-X/blob/bd0e256fac51271022bc3bbbdad22dd1295648e5/AltairHL_emulator/web_socket_server.c#L35

Cheers Dave

Davidson Francis · Answer 2 · Thu Apr 28 2022 05:22:52 GMT+0800 (China Standard Time)

Hi @gloveboxes,
Thanks for giving a try to implement/suggest a solution, so let me see if I understand:

You maintain two 'counters': a global one (for ping broadcast) and a local one (for pinging a specific client). When you send a ping, you send this id together (as a string) and returns the previous one, depending on whether broadcast or not.

When the server receives a PONG, it gets this id and saves it as global (and also into the structure of the client that received it) so that the next PING returns this value.

If the current PING id is far from the latest PONG received (or not), it means that the lastest PINGs were not received and the client is not responding anymore. Is this right?

Very smart idea, I'm impressed. This not only solves the 'timeout' problem (the next PING determines whether the previous PONG was received, i.e. the time between PINGs is the PONG timeout) but also allows me to set a 'threshold' for how many ignored PINGs the user might tolerate.

Very interesting indeed. I will probably follow this approach, I just need to think carefully about whether I need to protect these IDs with locks and what could happen if the user mixes broadcast pings with pings to a client.

It would probably be interesting to let the server itself manage the global and local IDs and leave the comparison to the user, something like:

/**
 * @brief Send a broadcast ping or to a client
 * @param cli Client to be sent.
 * @return Returns the latest received PONG id.
 */
int ws_ping(ws_cli_conn_t *cli);

/* @brief Compares the latest received PONG id, with some threshold.
 * @param cli Client connection.
 * @param pong_id Latest PONG id.
 * @param threshold Distance between current PING and the latest PONG
 * @return Returns 1 if expired, 0 otherwise.
 */
int ws_ping_is_expired(ws_cli_conn_t *cli, int pong_id, int threshold);

/* Usage. */
...
int last_pong = ws_ping(cli);

if (ws_ping_is_expired(cli, last_pong_id, my_threshold))
    ws_close_client(cli);
...

This seems to work fine for pinging a single client, but what about the broadcast? The user would have to keep a list of active clients (just like you do in your project) and only then be able to iterate over them and make a close on each one of them, I would like to avoid that.

How about providing a routine that does this, like:

void ws_close_expired_ping(ws_cli_conn_t *cli, int last_pong, int threshold);

I don't know if this would be an ugly solution to the problem or not, but there is an advantage: ws_close_client() performs the complete close handshake (which creates a timeout thread to wait for the client to close the connection). Performing the close handshake on an unresponsive client seems unnecessary: the connection can be aborted directly by the server.

So the user thread/timer could be like:

/* for single client. */
{
    int last_pong = ws_ping(client);
    ws_close_expired_ping(client, last_pong, 1);
}

/* for broadcast. */
{
    int last_pong = ws_ping(NULL);
    ws_close_expired_ping(NULL, last_pong, 1);
}

(the 'last_pong' could also be saved internally)

Then the user would only worry about sending the periodic pings and invoking the close routines.

What's yours thoughts on this?

Dave Glover · Answer 3 · Thu Apr 28 2022 09:11:22 GMT+0800 (China Standard Time)

Hey @Theldus, I've given this more thought today. I don’t think it makes sense to keep a global_pong_id. It only makes sense if you have one client and using NULL, I think it’s better to have one model. If you had multiple clients, you can’t assume that the global_pong_id is applicable to all of the clients, and it would only take one client to response for you to think that all clients were still connected. I’ve modified my fork, tested, and I like as simpler 😊 https://github.com/gloveboxes/wsServer/blob/0eb89b43ab34775a0de1b7a9dbf453c410bb36a7/src/ws.c#L541 If you like enough, then also happy to do a PR and you can review further. Cheers Dave Hi @gloveboxes Thanks for giving a try to implement/suggest a solution, so let me see if I understand: You maintain two 'counters': a global one (for ping broadcast) and a local one (for pinging a specific client). When you send a ping, you send this id together (as a string) and returns the previous one, depending on whether broadcast or not. When the server receives a PONG, it gets this id and saves it as global (and also into the structure of the client that received it) so that the next PING returns this value. If the current PING id is far from the latest PONG received (or not), it means that the lastest PINGs were not received and the client is not responding anymore. Is this right?

…

________________________________ Very smart idea, I'm impressed. This not only solves the 'timeout' problem (the next PING determines whether the previous PONG was received, i.e. the time between PINGs is the PONG timeout) but also allows me to set a 'threshold' for how many ignored PINGs the user might tolerate. Very interesting indeed. I will probably follow this approach, I just need to think carefully about whether I need to protect these IDs with locks and what could happen if the user mixes broadcast pings with pings to a client. It would probably be interesting to let the server itself manage the global and local IDs and leave the comparison to the user, something like: /** * @brief Send a broadcast ping or to a client * @param cli Client to be sent. * @return Returns the latest received PONG id. */ int ws_ping(ws_cli_conn_t *cli); /* @brief Compares the latest received PONG id, with some threshold. * @param cli Client connection. * @param pong_id Latest PONG id. * @param threshold Distance between current PING and the latest PONG * @return Returns 1 if expired, 0 otherwise. */ int ws_ping_is_expired(ws_cli_conn_t *cli, int pong_id, int threshold); /* Usage. */ ... int last_pong = ws_ping(cli); if (ws_ping_is_expired(cli, last_pong_id, my_threshold)) ws_close_client(cli); ... This seems to work fine for pinging a single client, but what about the broadcast? The user would have to keep a list of active clients (just like you do in your project) and only then be able to iterate over them and make a close on each one of them, I would like to avoid that. How about providing a routine that does this, like: void ws_close_expired_ping(ws_cli_conn_t *cli, int last_pong, int threshold); I don't know if this would be an ugly solution to the problem or not, but there is an advantage: ws_close_client() performs the complete close handshake (which creates a timeout thread to wait for the client to close the connection). Performing the close handshake on an unresponsive client seems unnecessary: the connection can be aborted directly by the server. So the user thread/timer could be like: /* for single client. */ { int last_pong = ws_ping(client); ws_close_expired_ping(client, last_pong, 1); } /* for broadcast. */ { int last_pong = ws_ping(NULL); ws_close_expired_ping(NULL, last_pong, 1); } (the 'last_pong' could also be saved internally) Then the user would only worry about sending the periodic pings and invoking the close routines.

________________________________ What's yours thoughts on this? —

Dave Glover · Answer 4 · Thu Apr 28 2022 09:15:24 GMT+0800 (China Standard Time)

If the current PING id is far from the latest PONG received (or not), it means that the lastest PINGs were not received and the client is not responding anymore. Is this right?

@Theldus, yes correct - that is how it works... Cheers Dave

Dave Glover · Answer 5 · Thu Apr 28 2022 09:25:46 GMT+0800 (China Standard Time)

I don't know if this would be an ugly solution to the problem or not, but there is an advantage: ws_close_client() performs the complete close handshake (which creates a timeout thread to wait for the client to close the connection). Performing the close handshake on an unresponsive client seems unnecessary: the connection can be aborted directly by the server.

So the user thread/timer could be like:
/* for single client. */
{
    int last_pong = ws_ping(client);
    ws_close_expired_ping(client, last_pong, 1);
}

/* for broadcast. */
{
    int last_pong = ws_ping(NULL);
    ws_close_expired_ping(NULL, last_pong, 1);
}
(the 'last_pong' could also be saved internally)

Then the user would only worry about sending the periodic pings and invoking the close routines.

What's yours thoughts on this?

Ok, I think this works and would be cleaner. So you'd maintain a ping and pong for each client. And I think you are saying you can shortcut circuit the server teardown of a disconnected client connection. Again, I like.

Davidson Francis · Answer 6 · Thu Apr 28 2022 11:03:33 GMT+0800 (China Standard Time)

@gloveboxes

If you like enough, then also happy to do a PR and you can review further.

Yes, please go ahead and send the PR with your code.

Ok, I think this works and would be cleaner. So you'd maintain a ping and pong for each client. And I think you are saying you can shortcut circuit the server teardown of a disconnected client connection. Again, I like.

Yes, what I want to do is take your idea and adapt it to what I said: leave it to the server to increment the 'pong id' and implement the close routine for ping (the 'ws_close_expired_ping()'): it iterates (or not) over the clients list and individually checks if the client's last_pong is within the threshold of the current value (the one just sent) of that same client: if yes, ok, if not, abort the connection.

With that in mind, it might be better to keep the previous and current pong id in the client structure, and not return anything from 'ws_ping()'. This makes it easier for the close routine to check each client individually (if the id is 'expired' or not).

One thing I had also thought about: in this scenario, it might be possible to combine ping and close into a single routine, something like:

void ws_ping_and_close(ws_cli_conn_t *cli, int threshold);

(as usual, the cli argument indicates whether to broadcast or not)

This routine would play the role of 'ws_ping()' and 'ws_close_expired_ping()' together. I don't know if this sounds good or bad, but it seems simpler from a user point of view: whenever you want to ping and kill connections, call this function.

Dave Glover · Answer 7 · Thu Apr 28 2022 12:49:47 GMT+0800 (China Standard Time)

One thing I had also thought about: in this scenario, it might be possible to combine ping and close into a single routine, something like:
void ws_ping_and_close(ws_cli_conn_t *cli, int threshold);

@Theldus The same idea crossed my mind - I think it would make sense to combine the functions and hide the ping and pong IDs altogether as they are just implementation details.

Cheers Dave

Dave Glover · Answer 8 · Thu Apr 28 2022 14:56:14 GMT+0800 (China Standard Time)

I don't 100% like the function name

void ws_ping_and_close(ws_cli_conn_t *cli, int threshold);

how about

ws_ping_close_disconnected(ws_cli_conn_t *cli, int threshold);

or maybe just stick with

ws_ping(ws_cli_conn_t *cli, int threshold);

and document the behaviour that ws_ping will close disconnected clients...

Davidson Francis · Answer 9 · Fri Apr 29 2022 10:08:08 GMT+0800 (China Standard Time)

I don't 100% like the function name
[...]
how about
ws_ping_close_disconnected(ws_cli_conn_t *cli, int threshold);
or maybe just stick with
ws_ping(ws_cli_conn_t *cli, int threshold);
and document the behaviour that ws_ping will close disconnected clients...

Yes, I totally agree with you, the name I proposed was bothering me too.
I think I'll keep it as ws_ping() and document the function properly.