python-websockets / websockets

Library for building WebSocket servers and clients in Python

Home Page:https://websockets.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible race condition between `websocket.sync.client.ClientConnection::send` & keepalive ping

lucamuscat opened this issue · comments

Hi!

I have recently encountered an issue similar to that of #922, where (possibly?) the automatic keepalive ping causes a websocket message with an invalid opcode to be sent, consequently killing the connection between the client and the connection.

For some context, I am using a different websocket server to that mentioned in #922 , written in a completely different language (tokio tungestenite, rust). I suspect that this error is not due to the websocket server not being able to handle pings with arbitrary data, but a race condition occuring in the websocket client itself.

The following is a screenshot of a wireshark dump, highlighting an erroneous websocket message generated by the websockets library:
image.

Unfortunately, I do not have an MVP for this bug, as it manifests itself sporadically inside a suite of integration tests. Looking through the ping's code, the fact that self.send_in_progress is not checked prior to sending the ping is highly suspicious:

def ping(self, data: Optional[Data] = None) -> threading.Event:
"""
Send a Ping_.
.. _Ping: https://www.rfc-editor.org/rfc/rfc6455.html#section-5.5.2
A ping may serve as a keepalive or as a check that the remote endpoint
received all messages up to this point
Args:
data: Payload of the ping. A :class:`str` will be encoded to UTF-8.
If ``data`` is :obj:`None`, the payload is four random bytes.
Returns:
An event that will be set when the corresponding pong is received.
You can ignore it if you don't intend to wait.
::
pong_event = ws.ping()
pong_event.wait() # only if you want to wait for the pong
Raises:
ConnectionClosed: When the connection is closed.
RuntimeError: If another ping was sent with the same data and
the corresponding pong wasn't received yet.
"""
if data is not None:
data = prepare_ctrl(data)
with self.send_context():
# Protect against duplicates if a payload is explicitly set.
if data in self.pings:
raise RuntimeError("already waiting for a pong with the same data")
# Generate a unique random payload otherwise.
while data is None or data in self.pings:
data = struct.pack("!I", random.getrandbits(32))
pong_waiter = threading.Event()
self.pings[data] = pong_waiter
self.protocol.send_ping(data)
return pong_waiter

I encountered this bug in both v11.0.2 and v12 of the library on an Ubuntu 20.04 machine.

For a quick workaround, I tried to disable the keepalive ping by setting ping_timeout=None, however, websockets.sync.client.connect does not expose this parameter, unlike its asyncio counterpart. Is it possible to disable keepalive pings for the threading variant of the library?

Regards,
Luca

That's really weird. The opcode is 0110 which is 6. websockets doesn't declare an opcode 6.

class Opcode(enum.IntEnum):
"""Opcode values for WebSocket frames."""
CONT, TEXT, BINARY = 0x00, 0x01, 0x02
CLOSE, PING, PONG = 0x08, 0x09, 0x0A

Are you sure that this frame is sent by the websockets client to the server, rather than the opposite?

Are you able to enable debug logs and share them when the error happens? That would help me a lot.

For a quick workaround, I tried to disable the keepalive ping by setting ping_timeout=None, however, websockets.sync.client.connect does not expose this parameter, unlike its asyncio counterpart. Is it possible to disable keepalive pings for the threading variant of the library?

They aren't implemented yet in the threading variant :-) So it's not that!

Thanks for the speedy reply 😊

I ran the test suite again, and recorded the log file as requested. The following is what was generated (with the GET url and host redacted by myself):

= connection is CONNECTING
> GET ... HTTP/1.1
> Host: ...
> Upgrade: websocket
> Connection: Upgrade
> Sec-WebSocket-Key: TjKDbdIYPkfnFeLXFvkNJQ==
> Sec-WebSocket-Version: 13
> Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
> User-Agent: Python/3.11 websockets/11.0.3
< HTTP/1.1 101 Switching Protocols
< Alt-Svc: h3=":443"; ma=2592000
< Connection: Upgrade
< Sec-Websocket-Accept: +WjrN8tmvMfDxaJVpCvPh/V5SdM=
< Server: Caddy
< Upgrade: websocket
< Date: Tue, 16 Jan 2024 13:27:12 GMT
= connection is OPEN
> BINARY 82 a4 74 79 70 65 a8 4e 61 76 69 67 61 74 65 a4 ... 6f 61 64 2e 68 74 6d 6c [99 bytes]
< CLOSE 1005 (no status code [internal]) [0 bytes]
> CLOSE 1005 (no status code [internal]) [0 bytes]
= connection is CLOSING
< EOF
> EOF
= connection is CLOSED

This time, the websocket server complained that an invalid opcode of 3 was received. The following is another screenshot from wireshark, notice the client sending the invalid message after the server sends a close message:

image

From the client's perspective, the logs show:

  1. A binary frame with a payload of 99 bytes sent from the client to the server
  2. A close frame without payload sent from the server to the client (that might be the server closing the connection because it disliked what the client sent)
  3. A close frame without payload sent from the client to the server (which is the proper way to complete the closing handshake)

This looks proper.

Your wireshark screenshot must be read bottom-to-top (based on timestamps). It shows:

  1. A binary frame sent from the client (172.26.0.8) to the server (172.26.0.10). Its length is 86, which is less than 99, and compression isn't enabled (the server doesn't support) so we aren't looking at the same frame as in the debug logs.
  2. A binary frame sent from the server to the client (length 275). This frame doesn't appear in the debug logs.
  3. A close frame sent from the server to the client.
  4. What looks essentially like random data sent from the client to the server. (The unknown opcode is a red herring. Some reserved bits are set too and that's illegal. Overall it looks like random data.) I have no idea how you could do that with websockets, especially after the socket is closed, which the debug logs are clearly showing.
  5. A second close from the server to the client, which is illegal.

I'm unconvinced by your claim that this is generated by the websockets library. I understand that you cannot share the full code; perhaps you can share at least how you invoke connect and how you build all arguments passed to connect? Also, can you confirm that you you aren't mocking anything on which websockets depends?

I apologize for the delay for getting back to you; I appreciate your patience and the help that you have provided!

I guess it's back to the drawing board for me, however, I think that you are correct about the bug having nothing to do with the websocket client itself.

I will close this issue, and re-open it in case I find anything else that might be of interest to this repo 😊