ACCESS VIOLATION - Writing to nullptr
ratzrattillo opened this issue · comments
While working with libzmq, which currently embeds wepoll 1.5.8, i had several crashes due to access violations.
These seem to stem from a special behaviour caused by GetQueuedCompletionStatusEx().
This function sometimes returns success (TRUE), containing a positive completion_count (e.g. 1), but still contain an iocp_events[0].lpOverlapped structure with a NULL value. The function is called in line 1223:
https://github.com/piscisaureus/wepoll/blob/dist/wepoll.c#L1223
Later on, wepoll tries to write to this memory location in line 1842:
https://github.com/piscisaureus/wepoll/blob/dist/wepoll.c#L1842
Attached you can find some screenshots of my debug session:
A fix would be highly appreciated :)
Is this behavior documented in GetCompletionPacketEx
? Unless external packets are sent via the IOCP handle this shouldn't be an issue, if my memory of the Windows I/O API serves me right.
Otherwise, #20 may fix this.
The following Remark i found in: https://learn.microsoft.com/en-us/windows/win32/fileio/getqueuedcompletionstatusex-func#remarks
This function returns TRUE when at least one pending I/O is completed, but it is possible that one or more I/O operations failed. Note that it is up to the user of this function to check the list of returned entries in the lpCompletionPortEntries parameter to determine which of them correspond to any possible failed I/O operations by looking at the status contained in the lpOverlapped member in each OVERLAPPED_ENTRY.
It is not clearly stating that lpOverlapped can be NULL, when more than 0 completions have been dequed, however it is the behavior i experienced.
Is there a plan to merge #20 ?
It's hard to believe a NULL lpOverlapped would just randomly show up on the completion port.
Are you using the async_io crate by any chance? Because they use a forked version of wepoll that deliberately people posts NULL packets to the completion port.
libzmq
is written in C++ and doesn't use async-io
. In addition, there are no crates that depend on async-io
that have reported this bug as far as I know.
Are there any places in libzmq
where an IOCP packet is manually posted to the wepoll
port?
libzmq
is written in C++ and doesn't useasync-io
.
The OP is reporting they are using Rust, so they could be using async-io.
In addition, there are no crates that depend on
async-io
that have reported this bug as far as I know.
Crates that depend on async-io depend on https://github.com/aclysma/wepoll-ffi which is a patched wepoll that supports this, so I wouldn't expect them to run into this issue in isolation. It only becomes a problem when they use async-io and then link with a "stock" wepoll.
Other than that I can't think of any other way this could ever happen. The latest version of wepoll has been out for more than 2.5 years, and it is used enough in production that if this were a wepoll bug, it would have come up a lot earlier. So there must be something particular to the OP's setup that causes this.
Thanks for the catch @piscisaureus! I've opened smol-rs/polling#85 and aclysma/wepoll-ffi#1 to address this
As of version v2.6.0, polling
now no longer uses wepoll
. This issue can probably be closed.