pebbe / zmq4

A Go interface to ZeroMQ version 4

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lot's of "interrupted system call" on RecvBytes() and SendBytes() with go 1.14 linux

mfl-q opened this issue · comments

commented

Hi,

I am seeing lot's of errors "interrupted system call" on a typical broker implementation when compiling with the current go version 1.14 and running on linux/amd64. According to the release notes this is sort of expected. My question is, if we can expect this to be handled in the library in the future or if we need to implement a loop around each and every zmq call retrying it in case of EINTR?

The code looks something like this: (results and pub are *zmq.Socket):

for {
    // receive a result
    msg, err := results.RecvMessageBytes(0)
    total++
    if err != nil {
      log.Printf("Error receiving a result: %s", err)
      errorcount++
      continue
    }
    m := msg[0]
    // publish it (it should have already the topic prepended)
    _, err = pub.SendBytes(m, 0)
    if err != nil {
      log.Printf("Error publishing message: %s", err)
      errorcount++
      // fall through, so we can still push it to another socket
    }
...

I am running in a docker container in case that matters. Compiling with go 1.13 does not show the issue.

Thank you!

--Matthias

It's not just those interrupted system calls.
There is also another problem with Go 1.14: the code freezes.

I could (probably) fix the EINTR thing in the package. But I won't do that at the moment,
because I have no idea how to fix the freezing problem, and the package would still be
unusable with Go1.14.

If anyone knows how to fix this, please let me know.

I asked in golang-nuts about this. I got a lot of response about the interrupted system calls,
but so far no comment on the freezing issue.
See: https://groups.google.com/forum/#!topic/golang-nuts/oqXOnR1GJ4g

EDIT: later I did get help about the freezing issue.

And I should learn to read... apologies!

Update: the freezing and the interrupted system calls might be two separate issues.

The freeze happens in Reactor.Run(), so I have to look at the implementation of that. I don't
know if I can fix it. Otherwise, you could still use zmq4 if you don't use the reactor.

I will look at the interrupted system calls. I am not sure what the best solution would be.
Always retry? Retry only a number of times? Retry in all circumstances?

A temporary solution is to build (your main program?) with the option GODEBUG=asyncpreemptoff=1

commented

Thank you very much for the quick response and the interesting discussion on golang-nuts.
For now I have reverted back to using go 1.13.

Regarding the error handling in the package, I guess this is entirely up to you. The zeromq guide says:

http://zguide.zeromq.org/page:all#Handling-Errors-and-ETERM

Real code should do error handling on every single ZeroMQ call. If you're using a language binding other than C, the binding may handle errors for you.

So I guess from that perspective my code would be technically broken right now, because it cannot deal with EINTR properly. On the other hand EINTR seems now to happen way more often in practice then any other error, so it might be worthwhile to handle this in the package to make it easier to use.

What I am not sure about at all, is how well the ZeroMQ library behaves with all those interrupted syscalls. Is there any performance penalty? Or does it maybe have any other implications?

I have released version 1.1.0 with a solution for these problems. See the top of the package documentation.

Everything seems to be working fine now, even the Reactor, but I'm not sure the Reactor is
actually save to use.

I get one test error on the draft version. I don't know why.

Found the error in the draft version. All tests for draft now succeed.

Another update. Automatic handling of EINTR is now set to true, so you don't need to enable it.

commented

I have updated to the new release and it works flawlessly so far with the current go version.
Thank you very much for taking care of this!
I am not sure if I should close the issue, or if you still need to keep track of something? From my point of view it can be closed.

My code runs about 10x slower with 1.14 than with 1.13 (my code runs as fast as ZMQ transmissions allows it to run). I think it is related to this issue and the new signal handling?
Is anyone else experiencing the same?

My code runs about 10x slower with 1.14 than with 1.13 (my code runs as fast as ZMQ transmissions allows it to run). I think it is related to this issue and the new signal handling?
Is anyone else experiencing the same?

Try running your program with the environment variable GODEBUG=asyncpreemptoff=1
Is it still slower?

Sorry for the late reply @pebbe. But setting GODEBUG=asyncpreemptoff=1 does not seem to work. (I reverted to golang 1.13 instead, but I'd love a fix)