zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1

Home Page:https://www.zeromq.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Assert in `zmq::object_t::process_command` (src/object.cpp:170)

Joacchim opened this issue Β· comments

Hello,

First of all, thanks for the great work on this library. I know it's not new, and quite popular, but you guys deserve the thanks nonetheless πŸ’.
This issue is, depending on your answers, either a call for help (understanding what we do wrong), or a bug report 🀷.

My team is using ZMQ behind its python3 binding, in order to handle communications between multiple processes (python's multiprocessing Process). While it's working fine most of the time since we've set it up (during 2023), we've recently encountered an odd assertion within what I believe to be the ZMQ code, since the file path seems to match, and one of the remaining processed logged ZMQ errors right after the one that encountered the assert went down.

Connection model: multiple 1-to-1 Dealer/Router connections between multiple processes.
Usage of threads: Yes

  • main thread to serve the API via ZMQ (thus, as a Router)
  • one thread to update internal state based on the communications with a dedicated set of peers, via ZMQ too (as a Dealer this time).

Alas, we did not have CoreDumps activated on that server, and did not see the issue reproduced since (not so long ago, it might at some point, though).

From what I understand of the code, the assert could be related to the command_t.type field being set to command_t::done. As I am clearly not an experienced ZMQ user, I lack the context allowing me to understand what could have happened.

Environment

  • libzmq version (commit hash if unreleased): Debian package version: 4.3.4-1
  • Language binding: Python (Debian package python3-zmq 20.0.0-1)
  • OS: Debian 11 (bullseye)

Minimal test code / Steps to reproduce the issue

Sorry, we've only encountered the issue once, and I lack information about the exact issue to be able to slap together a reproduction case.

What's the actual result? (include assertion message & call stack if applicable)

Message caught by our systemd unit's journal:

Assertion failed: false (src/object.cpp:170)

What's the expected result?

Probably having a ZMQ error that we could handle in the python code somehow ?