foxglove / ws-protocol

Foxglove Studio WebSocket protocol specification and libraries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python: Race condition with client.subscriptions_by_channel

AndrewNolte opened this issue · comments

Description
It seems like there needs to be a lock around some of these functions. I think a channel was added or removed while looping through the channels that needed to be broadcast.

[ERROR] 2022-02-14 16:53:57.092 [:7]: Set changed size during iteration
[ERROR] 2022-02-14 16:53:57.096 [:7]: Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_gerdmrts/runfiles/x/pmx/rover/app/foxglove_bridge/foxglove_bridge.py", line 59, in broadcast
    packet.SerializeToString(),
  File "/tmp/Bazel.runfiles_gerdmrts/runfiles/x/pmx/rover/app/foxglove_bridge/foxglove_websocket/server.py", line 170, in send_message
    for sub_id in subs:
RuntimeError: Set changed size during iteration

https://github.com/foxglove/ws-protocol/blob/main/python/src/foxglove_websocket/server.py#:~:text=)-,async%20def%20send_message(self%2C%20chan_id%3A%20ChannelId%2C%20timestamp%3A%20int%2C%20payload%3A%20bytes,),-async%20def%20_send_json

  • Version:
    Latest, the line numbers are off because of our formatter.

  • Platform:
    Ubuntu 20.4, Python 3.7 (back-ported types)

Steps To Reproduce
It can probably be reproduced by having one thread broadcast data on a channel, while another repeatedly adds and deletes that channel. It's the first time I've seen this race condition after using foxglove for a couple weeks.

Expected Behavior
No run time error

Actual Behavior
rare race condition

Hi, can you please share some of your threading code? The server is currently not written to be thread-safe, and you will need some kind of synchronization. One way I'd recommend is to use asyncio's call_soon_threadsafe method. Example usage of call_soon_threadsafe with a queue: https://gist.github.com/jtbandes/c00f01a6d156a223cfd0f409a52f87db

I do agree there's a possible bug, since the server has some awaits inside these for loops. But if you have sample code / steps to reproduce that would be helpful!

Ok thanks for the clarification! I just added locks on the calling side. The way I have the code set up is I have call backs for when topics are added/deleted on our end to add/remove channels on foxglove. Then a loop that broadcasts cached messages at a certain interval.

I have call backs for when topics are added/deleted on our end to add/remove channels on foxglove

Yep, assuming these callbacks are happening in a separate thread from your async with FoxgloveServer, you will need to do something like call_soon_threadsafe to add/remove the channels from the server thread. Maybe we can add some automatic validation that the methods are being called from the correct thread to help avoid these issues.

I'm having similar issues. Do either of you have some concrete implementation of this in code anywhere?

Just created an example threaded server: #42

Let me know if this example is helpful!

This makes perfect sense. Very comprehensive example, thank you!

Using the example and a small client script I was also able to reproduce the Set changed size during iteration issue, so I'll put up a fix for that soon.