somdoron / AsyncIO

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Concurrent Send Operations Result in SocketError 996

wmjordan opened this issue · comments

commented

In the attached code snippet, the Send method is called by various threads.

Some calls will print
996 Send 10 on the console, meaning that the IO operation is incomplete--please see https://msdn.microsoft.com/en-us/library/windows/desktop/ms740668(v=vs.85).aspx
It seems that the bytes are still received by the server.

Is it safe to treat 996 a successful event, or change the calling procedure to wait for the event, like the following SO answer suggested?
https://stackoverflow.com/questions/35208259/overlapped-socket-io-wsagetoverlappedresult-fails-with-996-errorcode

static void Main(string[] args) {
	CompletionPort completionPort = CompletionPort.Create();

	AutoResetEvent listenerEvent = new AutoResetEvent(false);
	AutoResetEvent clientEvent = new AutoResetEvent(false);
	AutoResetEvent serverEvent = new AutoResetEvent(false);

	AsyncSocket listener = AsyncSocket.Create(AddressFamily.InterNetwork,
		SocketType.Stream, ProtocolType.Tcp);
	completionPort.AssociateSocket(listener, listenerEvent);

	AsyncSocket server = AsyncSocket.Create(AddressFamily.InterNetwork,
		SocketType.Stream, ProtocolType.Tcp);
	completionPort.AssociateSocket(server, serverEvent);

	AsyncSocket client = AsyncSocket.Create(AddressFamily.InterNetwork,
		SocketType.Stream, ProtocolType.Tcp);
	completionPort.AssociateSocket(client, clientEvent);

	int received = 0, sent = 0;
	Task.Factory.StartNew(() =>
	{
		CompletionStatus[] completionStatuses = new CompletionStatus[1000];
		while (true) {
			int removed;
			var result = completionPort.GetMultipleQueuedCompletionStatus(-1, completionStatuses, out removed);

			if (result) {
				for (int i = 0; i < removed; i++) {
					var completionStatus = completionStatuses[i];
					Console.WriteLine("{0} {1} {2}", completionStatus.SocketError,
						completionStatus.OperationType, completionStatus.BytesTransferred);
					//if ((int)completionStatus.SocketError == 996) {
					//	System.Diagnostics.Debugger.Break();
					//}
					switch (completionStatus.OperationType) {
						case OperationType.Send:
							Interlocked.Add(ref sent, completionStatus.BytesTransferred);
							break;
						case OperationType.Receive:
							Interlocked.Add(ref received, completionStatus.BytesTransferred);
							break;
					}
					if (completionStatus.State != null) {
						AutoResetEvent resetEvent = (AutoResetEvent)completionStatus.State;
						resetEvent.Set();
					}
				}
			}
		}
	});

	listener.Bind(IPAddress.Any, 5555);
	listener.Listen(1);

	client.Connect("localhost", 5555);

	listener.Accept(server);


	listenerEvent.WaitOne();
	clientEvent.WaitOne();

	object o = new object();
	Parallel.For(0, 10, _ => {
		for (int i = 0; i < 100; i++) {
			lock (o) {
				client.Send(new byte[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }); 
			}
		}
		Thread.Sleep(10);
	});

	server.Receive(new byte[100000]);
	clientEvent.WaitOne();
	serverEvent.WaitOne();

	server.Dispose();
	client.Dispose();

	Console.WriteLine($"sent {sent} bytes, received {received} bytes");
}

Will you get another completion status after the 996? If so I think it safe to ignore this, however I suspect you won't get another completion status. If that the case we need to find a way to handle that.
I never encounter that error before, which windows version?

commented

No more completion status was received after the 996--if so, the sent will be larger than 10000.
I am using Windows 10.14393.1593, 64bit.

Sorry, I found the issue.
So currently AsyncIO don't support multiple send operation concurrently (or receive).
Only one operation at a time. Mainly because that is NetMQ usage. We can try to fix that, but the advantage is that we can reuse the overlapped structure for all send operations. To support multiple send operation we will have to have some kind of a object pooling.

you can do one receive and one send but not multiple of either of them

commented

Yep, I guessed that AsyncIO don't support multiple send operation concurrently. Hence I wrapped the Send operation with the lock structure. The Send method shall not be called by another thread until it returns.

still doesn't solve the problem. Because it is async operation the next send call re use the overlapped structure that the last send call used, which cause the 996 as you actually try to get the result of an operation that is not done yet. So you should "lock" the socket until you get a completion report.

commented

Understood. So, it is a somewhat challenging task to support concurrent Send operations.
Maybe a relatively easier solution is to spawn another thread to perform the actual Send operations and use AutoResetEvent and ConcurrentQueue to manage it.

In NetMQ that thread is also the completion port thread.

Why do you need the AutuResetEvent? AutoResetEvent can be expensive, I'm not sure how TaskCompletionSource is implemented internally, but I might be more efficient.

commented

Thanks, I will check out the source code of NetMQ and learn from it.
Is it possible to pool a lot of Overlapped objects to meet the concurrent requirement? Once an Overlapped is needed, it will be popped out of the pool and when the job is done, it is pushed back to serve subsequent calls.

We can have a buffer ring of overlapped inside a socket, so you would need to decide initially how many concurrent operation you want. If you do more you probably won't get an exception, only weird behavior.
The default can be one, so current behavior won't change.

I can help you make the pull request if you want.

commented

I think it is reasonable to predict the concurrency before initializing the application. Or if the concurrency can heuristically scale up, it will be terrific.
The documentation of SocketAsyncEventArgs also encourages large memory block pre-allocation and SocketAsyncEventArgs pooling at the initialization phrase.

Take a look here:
https://github.com/somdoron/AsyncIO/blob/master/Source/AsyncIO/Windows/Socket.cs#L14

So instead of having one, we should have concurrent queue (which will reduce performance), the concurrent queue can be shared with more than one socket actually. When empty we should create an overlapped structure. When IO operation is completed we need to return the overlapped structure.

We can initially initialize the object pool with as many structures as we think we need.

commented

Thank you for pointing out that! I will learn your code and see whether I can do something for it.

commented

Putting a queue in the AsyncSocket might be effective but appeared to be inefficient since not everybody wants concurrent sending operation support and .NET 3.5 does not have ConcurrentQueue at all.
Finally, I placed a flag and a queue in my own client class to synchronize sending operations and it was quite simple.

  1. Before calling the Send method, check the flag first by calling Interlocked.CompareExchange(ref _isSending, 1, 0) == 1, if true, put the data into the queue and return, else, call the Send method.
  2. At the successful completion event, check the queue _pendingBytes.TryDequeue(out data), if true, call the Send method to asynchronously send the queued data, else set the flag _isSending to 0.

The above routine appeared to have successfully eliminated 996 error code and things were fine during my initial tests.

This pattern can be tricky, on the completion because it is not atomic, if you check the queue and it is empty, between the time you checked the queue and set the flag a send request can be queued and you will miss it.

Take a look how is it being done in NetMQ:

https://github.com/zeromq/netmq/blob/master/src/NetMQ/Core/YPipe.cs#L113

https://github.com/zeromq/netmq/blob/master/src/NetMQ/Core/YPipe.cs#L143

You can share your code if you want and I will try to help.

commented

Thank you very much for pointing out the potential problem! You are right!

Here's the relevant code, error handling is stripped out for clarity:
The SendPendingData is called after a successful Send operation.

public void Send(byte[] data) {
	if (Interlocked.Exchange(ref _isOperating, 1) != 0) {
		_pendingData.Enqueue(data);
		return;
	}
	_clientSocket.Send(data);
}

void SendPendingData() {
	byte[] data;
	if (_pendingData.TryDequeue(out data)) {
		_clientSocket.Send(data);
	}
	else {
		// data could be enqueued at this moment and they will remain in the queue
		_isOperating = 0;
	}
}

As you have pointed out, before setting _isOperating = 0, the queue may be loaded with some data.

commented

I changed the SendPendingData to the following code, it should be safer now.

void SendPendingData() {
	do {
		byte[] data;
		if (_pendingData.TryDequeue(out data)) {
			_clientSocket.Send(data, 0, data.Length, SocketFlags.None);
			// send one package at a time
			break;
		}
		else {
			// another thread may enqueue data between TryDequeue and setting the op flag below
			Interlocked.Exchange(ref _isOperating, _pendingData.Count);
			// any of the following situation can happen:
			// 0. no data is queued = OK, quit
			// 1. the pending queue is be loaded, before setting the op flag, so op flag is not zero
		}
	} while (_isOperating != 0);
}
Int64 _counter = -1

public void Send(byte[] data) {
	if (Interlocked.Increment(ref _counter) == 0) {
                _clientSocket.Send(data);   		
		return;
	}
	
        _pendingBytes.Add(data);
}

void SendPendingData() {

	if (Interlocked.Decreament(ref _counter) != -1) {
                byte[] data; = _pendingBytes.Take(); // Blocking collection
		_clientSocket.Send(data);               
	}
}

Not sure about this, what do you think?

You might want to ask this at stackoverflow or here https://cs.stackexchange.com/

making correct lock free is complicated...

commented

I just posted my solution before your solution :)
I am studying yours now.

commented

Your solution should solve the concurrent sending problem, and it is neater than mine.

By the way, while I was studying your source code, I found that the m_outOverlapped in Socket.cs is used not only by the Send method, but the Connect method as well.
If we allow user to pend items to the outgoing queue while the connection is being established, we shall add some barrier around the call to the Connect method too.

@wmjordan you are right, we might be better using a disposable overlap for connect, as we don't afraid from performance penalty there...

commented

I took your above solution to solve the concurrent sending operation problem and the application had been working for about 16 hours without any problem. I am going to add more pressure to the application and see whether it will still work stably.
Another bonus of your solution is that it immediately gives the length of the sending and pending queue (at most one may be on the way of sending, and others are pending), after changing the initial value of _counter to 0.
I changed the BlockingCollection to ConcurrentQueue in your solution, since the former, by default, internally uses the latter, which also supports concurrency.

commented

Today I loaded some production data stream to the server application and I still met with the socket 996 error. Immediately after the 996 error, a NullReferenceException was thrown in the CompletionPort thread.

Here's the stack trace from the client side:

1.	AsyncIO.Windows.Overlapped.CompleteOperation(IntPtr overlappedAddress)
2.	AsyncIO.Windows.CompletionPort.HandleCompletionStatus(CompletionStatus& completionStatus, IntPtr overlappedAddress, IntPtr completionKey, Int32 bytesTransferred)
3.	AsyncIO.Windows.CompletionPort.GetMultipleQueuedCompletionStatus(Int32 timeout, CompletionStatus[] completionStatuses, Int32& removed)

I will check out the source code on the client side and post my findings.

Not sure about this issue, but you need to use BlockingCollection and not ConcurrentQueue as from the time you increase the counter the reader can already try and dequeue and will get a null as you didn't enqueue yet. Actually that can be solved as well with following code:

Int64 _counter = -1

public void Send(byte[] data) {
        _pendingBytes.Enqueue(data);    

	if (Interlocked.Increment(ref _counter) == 0) {
                SendPendingData(); // We awake the sender by calling SendPendingData
	}
}

void SendPendingData() {
	if (Interlocked.Decreament(ref _counter) != -1) {
                byte[] data;
                _pendingBytes.TryDequeue(out data);
		_clientSocket.Send(data);               
	}
}

By the way multiple receive operation also cannot be concurrent.

commented

Thank you for your reply.
The TryDequeue shall be safe. If the data has already been dequeued and nothing is actually dequeued, it returns false and conditionally we can handle that in your first solution.

int _counter = 0;
public void Send(byte[] data) {
	if (Interlocked.Increment(ref _counter) == 1) {
                _clientSocket.Send(data);	
		return;
	}
        _pendingData.Add(data);
}
void SendPendingData() {
	if (Interlocked.Decrement(ref _counter) != 0) {
		byte[] data;
		if (_pendingData.TryDequeue(out data)) {
			_clientSocket.Send(data, 0, data.Length, SocketFlags.None);
		}
	}
}

I checked the log file of my application, I guessed that I was closed to the reason of the NullReferenceException thrown from the CompleteOperation.

  1. The client connected to the server and listened any data sent from the server.
  2. The client was sending data to the server.
  3. The server somehow closed the connection, hence the receive operation was terminated.
  4. Once the client found that, it disposed the connection and tried to reconnect.
  5. The send operation on the way had its turn at the CompletionPort thread, and since the resource had already been disposed, a NullReferenceException was thrown.

The following lines were taken from the log files. The TcpClient was my client class.
I am not very familiar with the AsyncSocket. Could my assumption above explain the exception?

16:28:01	[Error]	System.Net.Sockets.SocketException: The remote host shut down an existing connection.
Error Code: 10054 (0x2746)
Socket Error: ConnectionReset
1.	AsyncIO.Windows.Socket.Receive(Byte[] buffer, Int32 offset, Int32 count, SocketFlags flags)
2.	TcpClient.HandleCompletion(CompletionStatus status)
16:28:02	[Error]	Send 996
16:28:02	[FatalError]	System.NullReferenceException: 
1.	AsyncIO.Windows.Overlapped.CompleteOperation(IntPtr overlappedAddress)
2.	AsyncIO.Windows.CompletionPort.HandleCompletionStatus(CompletionStatus& completionStatus, IntPtr overlappedAddress, IntPtr completionKey, Int32 bytesTransferred)
3.	AsyncIO.Windows.CompletionPort.GetMultipleQueuedCompletionStatus(Int32 timeout, CompletionStatus[] completionStatuses, Int32& removed)
4.	TcpClient.<.ctor>b__23_0(Object cp) // the CompletionPort thread
commented

There maybe a potential problem in concurrent scenario due to AsyncIO is sharing the same AsyncSocket for Connect and Send operations.
Please look at the following code in Socket.cs.

int bytesSend;

m_outOverlapped.StartOperation(OperationType.Connect);

if (m_connectEx(Handle, m_remoteAddress.Buffer, m_remoteAddress.Size, IntPtr.Zero, 0,
    out bytesSend, m_outOverlapped.Address))
{                
    CompletionPort.PostCompletionStatus(m_outOverlapped.Address);
}
else
{
    SocketError socketError = (SocketError)Marshal.GetLastWin32Error();

    if (socketError != SocketError.IOPending)
    {
        throw new SocketException((int)socketError);
    }                
}

If concurrent Connect and Send operations take place at the same time, the Connect method may run into the following place:

    if (socketError != SocketError.IOPending)
    {
        throw new SocketException((int)socketError);
    }

The simultaneous Send operation causes the Connect operation to receive an IOPending error, and that error will be omitted and no CompletionStatus would be received.

I suggest that we remove the if (socketError != SocketError.IOPending) condition and throw a SocketException no matter what error occurs.

/we can just use another overlapped for connect instead of the one being shared with Send.

however, if you are using TCP I don't see a scenario where you will send and connect at the same time?

commented

Since there's no such a scenario, thus I think the IOPending error should be thrown as an exception as well, rather than acting as if nothing has happened.

commented

About the concurrent Send and Connect IOPending issue, I used another flag _isConnecting to separate concurrent sending and connecting operations from each other. The problem appeared to be gone.

You are right about the use of BlockingCollection. When we TryDequeue on the queue, the data might not be queued from the other thread. I finally used a loop instead of the BlockingCollection, for the latter seemed heavier for me.

void SendPendingData() {
	if (Interlocked.Decrement(ref _counter) != 0) {
		byte[] data;
SEND:
		if (_pendingData.TryDequeue(out data)) {
			_clientSocket.Send(data, 0, data.Length, SocketFlags.None);
			break;
		}
		Thread.SpinWait(1000);
goto SEND;
	}
}
commented

Since I used the ConcurrentQueue to hold the pending data and use the above Interlocked counter pattern, I no longer met with the 996 SocketError. This issue could be closed.