pebbe / zmq4

A Go interface to ZeroMQ version 4

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect errno

Miosss opened this issue · comments

The problem

I issue non-blocking read on DEALER socket connected to ROUTER socket.
data, err := client.RecvMessage(zmq.DONTWAIT)

ROUTER takes at least 1 second to complete the task (due to sleep()) and I do the read immediately.

I expected to get EAGAIN error, but instead I got err == nil and len(data) == 0 - proper empty read.

Situation

By debugging the library it seems to me, that this call starts the error (RecvBytes, zmq4.go:1077):
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))

Here, size == -1 but err == nil.
Therefore errget(err) with nil returns nil instead of true error.

Maybe errget should do something when it is call with nil argument?

I believe that the root cause of this particular problem is not using zmq_errno.
In the documentation of that function it is said, that it should be used to properly get errno, when for example in a situation, where the application links to different C runtime, than the libzmq.

This is probably my case, because this happens on Windows, I have libzmq.dll built with MSVC and then generated stub libzmq.a using gcc dlltools. So the setup is exotic (but hey, welcome to compiling C libs on Windows + Go + Cgo).
What's more, during C. calls in Go, it returns plain errno and it is essentialy wrong in this case.

When I tried e := C.zmq_errno() just after the failed read - I get the correct EAGAIN (11) error.

Solutions?

While I probably could check C.zmq_errno() after each call, but I am not sure if it is sufficient enough and will the errors be cleared after succesful calls?
EDIT:
No, the error is not cleared. And since the returned err is nil, there is no way to now that C.zmq_errno() result is valid in this situation (+ all the threading issues possible).

One solution may be to drop all err from _, err := C. ... and call C.zmq_errno() instead? But it will require changes in many places.

Maybe modifications to errget will be sufficient? For example if argument err is nil the check the C.zmq_errno() ?

Forget about the previous comment. It's all wrong. An interrupted signal call gives a EINTR, not a EAGAIN. I undid the changes.

So what is the problem exactly? Provide code that demonstrates.

@pebbe
I am not sure if it is easily reproducible. I believe that the main reason behind this is exactly what zmq_errno() is for. I found it through this SO

Look at the definition of this function:
int zmq_errno (void) { return errno; }

It returns just the errno - but from the context of the library itself.

In my case I have libzmq.dll built with MSVC, but I use gcc from MSYS2 for CGO. Therefore there may be the problem with proper propagation of the errno - situation described in here.
Your error handling relies on what C calling subsystem in GO gives you here:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))

zmq_msg_recv only returns the size of the message, the err is given by golang:

Any C function (even void functions) may be called in a multiple assignment context to retrieve both the return value (if any) and the C errno variable as an error (use _ to skip the result value if the function returns void). For example:

by godoc

So this err is basically the same as just reading errno (which you in fact do in errget).

The problem is - errno in dll may be different errno in the app.
libzmq sets errno which only resides in dll, and in my app errno is always 0.

I understand that this problem is why zmq_errno came to be.

The problem itself

I run
msg, err := client.RecvMessage(zmq.DONTWAIT)

I am sure that there is no message in queue - I should receive msg = nil and err = EAGAIN.
This doesn't happen. I get msg = []byte("") (empty message) and err = nil.

By debugging your code I can see that:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
in this example returns size = -1 and err = nil.
Size = -1 clearly indicates that there IS and error, but Go gives you err = nil. In the next if you check the size to see if there is an error (and there is) and to get the actual error - you look into err. Which is nil.

So, size tells that there is an error, and err says there is none.
To me, the cause is in what I wrote in the begging - libzmq sets different errno, than CGO returns. You should probably check zmq_errno instead.

And look at this quote from zmq.h:

/* This function retrieves the errno as it is known to 0MQ library. The goal /
/
of this function is to make the code 100% portable, including where 0MQ /
/
compiled with certain CRT library (on Windows) is linked to an /
/
application that uses different CRT library. */
ZMQ_EXPORT int zmq_errno (void);

I think I may have a fix. Can you try the latest version, please?

The same situation happens when Binding to the same TCP port for the second time - it silently fails, but without an error. Therefore the process thinks that it can accept messages, while the underlying socket is dead.

This occurs in Bind (zmq4.go963):
i, err = C.zmq4_bind(soc.soc, s)

i = -1 but err = nil, so the same as previously.

I have downloaded your latest version and it seems fixed - this case (binding) now correctly returns error (though I am not sure why is it error 100 -> Cannot create another system semaphore, but this must be libzmq thing).
I have not yet tested the EAGAIN case, but I believe it is the same as for Bind.