somdoron / AsyncIO

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AccessViolationException in AsyncIO

msykora opened this issue · comments

Hello,

I get this intermittently when I'm trying to load test my communication component based on NetMQ 3.3.3.4, sending messages from thousands of router sockets on a single machine, connected to a single router socket on the "server":
Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.Runtime.InteropServices.Marshal.ReadInt32(IntPtr ptr, Int32 ofs)
at AsyncIO.Windows.Overlapped.get_Success()
at AsyncIO.Windows.CompletionPort.HandleCompletionStatus(CompletionStatus& completionStatus, IntPtr overlappedAddress, IntPtr completionKey, Int32 bytesTransferred)
at AsyncIO.Windows.CompletionPort.GetMultipleQueuedCompletionStatus(Int32 timeout, CompletionStatus[] completionStatuses, Int32& removed)
at NetMQ.Core.Utils.Proactor.Loop()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

I can't really include my code, but it seems like there is a possible bug in AsyncIO library?

Martin

I noticed this while tracking down access violations in my project. Interesting to see someone else with a similar problem.

I believe this can happen if the socket is aborted while an overlapped IO is still in progress. The overlapped buffer gets marked as disposed, then freed and can be garbage collected before the Success field is checked. I made the following modifications, I haven't made a patch yet, because I am not convinced it is the source of my errors.

My Windows.Overlapped.cs:

        public bool Success { get; private set; }
        //{
        //    get { return Marshal.ReadIntPtr(m_address).Equals(IntPtr.Zero); }
        //}
         public static Overlapped CompleteOperation(IntPtr overlappedAddress)
        {
            IntPtr managedOverlapped = Marshal.ReadIntPtr(overlappedAddress, MangerOverlappedOffset);

            GCHandle handle = GCHandle.FromIntPtr(managedOverlapped);

            Overlapped overlapped = handle.Target as Overlapped;
            overlapped.InProgress = false;

            if (overlapped.Disposed)
            {
                overlapped.Free();
                overlapped.Success = false;
            }
            else
            {
                overlapped.Success = Marshal.ReadIntPtr(overlapped.m_address).Equals(IntPtr.Zero);
            }
            return overlapped;          
        }        

Note that CompletionPort.cs needs to also check Disposed in this case, as it can see a Success = false, and try to get a Socket error, and retrieve a SocketError.Succcess code, which is possible if the socket was aborted but the operation was completed successfully.

I did something like this:

 if (!overlapped.Disposed)
                        {
                            bool operationSucceed = UnsafeMethods.WSAGetOverlappedResult(overlapped.AsyncSocket.Handle,
                                overlappedAddress,
                                out bytesTransferred, false, out socketFlags);
                            if (!operationSucceed)
                            {
                                socketError = (SocketError)Marshal.GetLastWin32Error();
                            }
                        }
                        else
                        {
                            socketError = SocketError.OperationAborted;
                        }

Would it be possible to path anyway if it's a clear error scenario? Thing is, this causes an exception from a background thread that takes down my entire application... I can't really catch it, so it's quite a big issue, only alternative I am seeing now is not running poller as async and instead permanently block a thread for it so I can run it in try/catch, which I don't like much... Or try NetMQ 4 which is no small deal and might not help anyway...

I forked and created a pull request. I had trouble setting up the environment for the whole project on my box ((I have been using a private fork for my work because of this).

If you'd like, you could use my fork of AsyncIO to test things out and see if it fixes your problem. This issue came up for me (I believe) when I had a heartbeat close non-responsive sockets while operations were still outstanding.

Sorry, I couldn't test it as I'd hoped as I'm putting out several fires at once, I only know it happens very intermittently... And when I tried to replicate the scenario simply - connecting a socket, sending a frame with sendmore flag and then closing the server socket while a message has not finished receiving, I got a different exception, from NetMQ (also from a background thread so also quite bad):
System.Net.Sockets.SocketException occurred
HResult=0x80004005
Message=An existing connection was forcibly closed by the remote host
Source=System
StackTrace:
at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at NetMQ.Core.Mailbox.TryRecv(Int32 timeout, Command& command)
at NetMQ.Core.SocketBase.ProcessCommands(Int32 timeout, Boolean throttle)
at NetMQ.Core.SocketBase.GetSocketOption(ZmqSocketOption option)
at NetMQ.Core.Utils.Selector.Select(SelectItem[] items, Int32 itemsCount, Int64 timeout)
at NetMQ.NetMQPoller.Run()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

Is it possible to path this and release new version if I am not able to test the fix from my side?

I believe you may be running into a separate issue I had, where on UWP, functions were returning failure from the socket API, but with an error code of SocketError.Success.
Which framework are you targeting?
I have a branch on my fork that addresses it. I thought it was only an issue with the .Net Native socket implementation when targeting UWP. I posted a comment about it under NetMQ Issue 105

Also, are you running NetMQ-3 or 4? I noticed you posted an issue under NetMQ-3. I am less familiar with 3, sorry.

Hi, I am using NetMQ 4.0.0.1 and AsyncIO 0.1.26 and I am getting the AccessViolationException exception:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.Runtime.InteropServices.Marshal.ReadInt32(IntPtr ptr, Int32 ofs)
at AsyncIO.Windows.Overlapped.get_Success()
at AsyncIO.Windows.CompletionPort.HandleCompletionStatus(CompletionStatus& completionStatus, IntPtr overlappedAddress, IntPtr completionKey, Int32 bytesTransferred)
at AsyncIO.Windows.CompletionPort.GetMultipleQueuedCompletionStatus(Int32 timeout, CompletionStatus[] completionStatuses, Int32& removed)
at NetMQ.Core.Utils.Proactor.Loop()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

My application would close the NETMQ socket on idle and then reopen if new data has to be transmitted and I see this exception causing the application to crash once in a while.

@southie / @somdoron , can you please confirm if the fix is part of the 0.1.26 version of AsyncIO. Github shows the fix being commited on Apr 29 and the 0.1.26 version was actually released on Jul 18

Regards,
Rajesh

i think it was included but I think i still see a problem.

I can try and provide a fix, would you be able to test it?

sorry, it was not included, it was april this year, version released last year.
I will push a new version next week. For now you can build from source and use it.

That would be great. I have trouble building the source with Visual Studio 2017 (some targetframework not getting recognized and processor architecture mismatch errors). I will try to resolve the build issues and try the fix. If I am not successful, I will take the fix after you make the release next week.

Thanks a lot @somdoron !

@pnrajesh new AsyncIO version released.
Going to release a NetMQ prerelease after the PR is meged

Hi @somdoron
I got the new AsyncIO version from NuGet, but the existing NetMQ version 4.0.0.1 throws the below exception when the application is run. Not sure how to resolve it. Would you be releasing a new NetMQ version that would work with the new AsyncIO 0.1.40?

Exception : Could not load file or assembly 'AsyncIO, Version=0.1.25.0'

I know you're focusing mostly on NetMQ4, but there be any chance of releasing a NetmQ 3 version that can actually upgrade AsyncIO? I cannot get any of these fixes as current NetMQ 3 has somewhere baked in dependency on AsyncIO 0.1.20.0, I tried everything, redirect doesn't work, editing the package dependency doesn't help... I would appreciate much if NetMQ could use the latest AsyncIO.

OK I can do that :)

siki

I would need to make a pull request for NetMQ 3 though, right?

Hhmm can't update to AsyncIO since part of NetMQ targets framework 3.5. Should I just leave that part on the old AsyncIO?

BTW not all unit tests are passing :-( how can I determine updating to AsyncIO 1.40 won't break anything when I can't trust the tests?

Is the 3.5 NetMQ still relevant? If it is in the same nuget package and has the same nuspec, I can't set the dependency to AsyncIO 1.40...
BTW when I tried to use AsyncIO 1.26 for NetMQ 3.5 it just wouldn't build and always be looking for 1.25 instead... Not sure what causes it, I hope this won't ruin attempts to use higher AsyncIO versions with NetMQ the same way as now, I suspect packaging both NetMQ versions together i sgoing to cause problems.

Would you be OK with releasing a new NetMQ package 3.3.3.5 with only the 4.0 target and newest AsyncIO?

Tried to push a branch to remote but I don't have rights :)

So how can I make that pull request?