Azure / azure-storage-net

Microsoft Azure Storage Libraries for .NET

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CloudBlockBlob.DownloadText() rarely will fail to return forever (v8.1.1)

TimLovellSmith opened this issue · comments

SDK version: <package id="WindowsAzure.Storage" version="8.1.1" targetFramework="net45" />
Platform version: .Net Framework 4.5.2.

The problem is that we have a loop which calls Exists() and DownloadText() on a blob approximately once per second. At some point, our app starts having temporary reliably calling blob storage, which manifests as TimeoutExceptions in some threads. Those threads later continue to talk to blob storage, but in one thread DownloadText() of the blob just hangs and never returns - eventually VM went down for OS patching weeks later and AFAICS it had still not returned.

Some debug code reveals a snapshot of the never-returning call stack is

DomainBoundILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)
System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef) 
System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32) 
System.Net.FixedSizeReader.ReadPacket(Byte[], Int32, Int32) 
System.Net.Security.SslState.StartReadFrame(Byte[], Int32, System.Net.AsyncProtocolRequest)
System.Net.Security.SslState.StartReceiveBlob(Byte[], System.Net.AsyncProtocolRequest) 
System.Net.Security.SslState.CheckCompletionBeforeNextReceive(System.Net.Security.ProtocolToken, System.Net.AsyncProtocolRequest) 
System.Net.Security.SslState.ForceAuthentication(Boolean, Byte[], System.Net.AsyncProtocolRequest)
System.Net.Security.SslState.ProcessAuthentication(System.Net.LazyAsyncResult) 
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) System.Net.TlsStream.ProcessAuthentication(System.Net.LazyAsyncResult)
System.Net.TlsStream.Write(Byte[], Int32, Int32)
System.Net.PooledStream.Write(Byte[], Int32, Int32)
System.Net.ConnectStream.WriteHeaders(Boolean)
System.Net.HttpWebRequest.EndSubmitRequest(System.Net.Connection.CompleteConnection(Boolean, System.Net.HttpWebRequest) 
System.Net.Connection.CompleteConnection(Boolean, System.Net.HttpWebRequest) 
System.Net.Connection.CompleteStartConnection(Boolean, System.Net.HttpWebRequest) 
System.Net.Connection.CompleteStartRequest(Boolean, System.Net.HttpWebRequest, System.Net.TriState) 
System.Net.Connection.SubmitRequest(System.Net.HttpWebRequest, Boolean) 
System.Net.ServicePoint.SubmitRequest(System.Net.HttpWebRequest, System.String) 
System.Net.HttpWebRequest.SubmitRequest(System.Net.ServicePoint) 
System.Net.HttpWebRequest.GetResponse()
Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[[System.Boolean, mscorlib]](Microsoft.WindowsAzure.Storage.Core.Executor.RESTCommand`1<Boolean>, Microsoft.WindowsAzure.Storage.RetryPolicies.IRetryPolicy, Microsoft.WindowsAzure.Storage.OperationContext) 

I am pretty sure we are using the default retry policy for this request.
The exact overload + parameters of DownloadText is:

DownloadText(encoding: null, accessCondition: null, options: null, operationContext: null);

Broken expectation:
I was expecting that this call should have failed with an exception after about 30 seconds, based on the description of the default retry policy.

Looks similar: #790?

From stacks, this actually appears to be the same underlying cause as #738, which is reported to occur in 9.3.0-9.3.2 but AFAICT that bug was never really fixed in this SDK but resolved with "don't use the table storage SDK, use the new CosmosDB SDK".

My working theory for this is that the problem here is that _TlsStream and NetworkStream have infinite read and write timeouts by default, because so does the underlying socket - and so when the socket has keep-alive, which it will by default with HttpWebRequest for HTTP 1.1 in order that HTTP 1.1 connections may be reused, and when the socket isn't actually closed by the remote side, but no data is received, this recv call's documented behavior is that it can really, truly just block forever.

Also, AFAICS, in 8.1.1 by default, there is no server-side timeout in the request parameters that are sent to storage.