jonhoo / faktory-rs

Rust bindings for Faktory clients and workers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix race between job completion and heartbeats

jonhoo opened this issue · comments

Currently, there exists a race between the thread that performs heartbeats and the worker thread that completes jobs. Specifically, when the worker waits for the OK following an ACK or a FAIL, it might receive the reply intended for the heartbeat, and vice-versa.

This is normally not an issue, since both heartbeats and workers generally receive OK responses. However, it can misbehave when a non-OK response is given (e.g., an error or a QUIET). There is also a write race here, where technically an outgoing message could be large enough that it ends up being flushed across multiple system calls, at which point a BEAT and an ACK/FAIL could be mixed.

Fixing this will require a bit of engineering. The easiest is probably to spin up a third RPC thread that is responsible for sending messages and waiting for replies in a single atomic step.

Actually, it seems like I lied. That's not at all an easy way. In particular, it would mean that Client only contains an mpsc::Sender on which you send a FaktoryCommand and another mpsc::Sender on which the response (or error) will be sent. Seems well and good, until you get around to connect and reconnect, which now become all sorts of thorny.

Another (perhaps simpler) way to implement this would be to give Client an Arc<Mutex<Read + Write>> directly, which it would then lock, do write, do read, and unlock, to guarantee atomicity. It's not particularly pretty, but should work.

Actually this is a non-issue, because all the methods require &mut self, so they already provide this exclusion. Consumer places the entire Client in a lock, which is what you'd want.