piscisaureus / wepoll

wepoll: fast epoll for windows⁧ 🎭

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ACCESS VIOLATION - Writing to nullptr

ratzrattillo opened this issue · comments

While working with libzmq, which currently embeds wepoll 1.5.8, i had several crashes due to access violations.
These seem to stem from a special behaviour caused by GetQueuedCompletionStatusEx().

This function sometimes returns success (TRUE), containing a positive completion_count (e.g. 1), but still contain an iocp_events[0].lpOverlapped structure with a NULL value. The function is called in line 1223:
https://github.com/piscisaureus/wepoll/blob/dist/wepoll.c#L1223

Later on, wepoll tries to write to this memory location in line 1842:
https://github.com/piscisaureus/wepoll/blob/dist/wepoll.c#L1842

Attached you can find some screenshots of my debug session:

screenshot1

screenshot2

A fix would be highly appreciated :)

Is this behavior documented in GetCompletionPacketEx? Unless external packets are sent via the IOCP handle this shouldn't be an issue, if my memory of the Windows I/O API serves me right.

Otherwise, #20 may fix this.

The following Remark i found in: https://learn.microsoft.com/en-us/windows/win32/fileio/getqueuedcompletionstatusex-func#remarks

This function returns TRUE when at least one pending I/O is completed, but it is possible that one or more I/O operations failed. Note that it is up to the user of this function to check the list of returned entries in the lpCompletionPortEntries parameter to determine which of them correspond to any possible failed I/O operations by looking at the status contained in the lpOverlapped member in each OVERLAPPED_ENTRY.

It is not clearly stating that lpOverlapped can be NULL, when more than 0 completions have been dequed, however it is the behavior i experienced.

Is there a plan to merge #20 ?

It's hard to believe a NULL lpOverlapped would just randomly show up on the completion port.

Are you using the async_io crate by any chance? Because they use a forked version of wepoll that deliberately people posts NULL packets to the completion port.

libzmq is written in C++ and doesn't use async-io. In addition, there are no crates that depend on async-io that have reported this bug as far as I know.

Are there any places in libzmq where an IOCP packet is manually posted to the wepoll port?

libzmq is written in C++ and doesn't use async-io.

The OP is reporting they are using Rust, so they could be using async-io.

In addition, there are no crates that depend on async-io that have reported this bug as far as I know.

Crates that depend on async-io depend on https://github.com/aclysma/wepoll-ffi which is a patched wepoll that supports this, so I wouldn't expect them to run into this issue in isolation. It only becomes a problem when they use async-io and then link with a "stock" wepoll.

Other than that I can't think of any other way this could ever happen. The latest version of wepoll has been out for more than 2.5 years, and it is used enough in production that if this were a wepoll bug, it would have come up a lot earlier. So there must be something particular to the OP's setup that causes this.

In fact, looking at the stack trace, I'm pretty convinced this is the issue now:
image

Thanks for the catch @piscisaureus! I've opened smol-rs/polling#85 and aclysma/wepoll-ffi#1 to address this

As of version v2.6.0, polling now no longer uses wepoll. This issue can probably be closed.