Gremlinq / ExRam.Gremlinq

A .NET object-graph-mapper for Apache TinkerPop™ Gremlin enabled databases.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

To many connection on azure function after updating function to v4 on .net 6

iAmBipinPaul opened this issue · comments

Hi we are facing issue where after updating to .NET 6 and azure function v4 , It opens to many connection so azure function becomes unavailable with error

Microsoft.Azure.WebJobs.Script.WebHost: Host thresholds exceeded: [Connections]. For more information, see https://aka.ms/functions-thresholds

we were running v8 on .NET Core 3.1 (Azure function V3) and it was working as expected , we updated to .NET 6 and Azure function v4 with v8 and faced to many connection issue so we updated to v9 and we still face that issue.

current version we are using

<PackageReference Include="ExRam.Gremlinq.Providers.CosmosDb" Version="9.1.0" />

Thank you !

It says 1200 total outgoing connections in the docs. It's pretty certain that the underlying Gremlin.NET won't spawn that many connections. In case you are sure it is Gremlin.NET, it would be great if you could do some diagnostics about the current tcp connections on your machine.

Hi @danielcweber this what we have found from our investigation on our azure function side.

2022-02-25T18:02:28.888 [Error] A host error has occurred during startup operation '59be07c6-8a58-4df1-b3fa-b2847b6a121d'.System.InvalidOperationException : Host thresholds exceeded: [Connections]. For more information, see https://aka.ms/functions-thresholds.at Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.IsHostHealthy(Boolean throwWhenUnhealthy) at //src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs : 664at Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.OnHostInitializing(Object sender,EventArgs e) at //src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs : 556at async Microsoft.Azure.WebJobs.Script.ScriptHost.InitializeAsync(CancellationToken cancellationToken) at //src/WebJobs.Script/Host/ScriptHost.cs : 285at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()at async Microsoft.Azure.WebJobs.Script.ScriptHost.StartAsyncCore(CancellationToken cancellationToken) at //src/WebJobs.Script/Host/ScriptHost.cs : 265at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()at async Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()at async Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.UnsynchronizedStartHostAsync(ScriptHostStartupOperation activeOperation,Int32 attemptCount,JobHostStartupMode startupMode) at /_/src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs : 309

image (1)

I will try to run this on my local machine and will attach more details on this here.

Thank you !

Hi @danielcweber I ran this on my local machine after executing one request it established 250 connection to one IP addresses on port 443 and I checked Ip address belongs to Microsoft.

This is how we are connection to Graph database ,It is registered as singleton and our connection pool values is like this

{
  "PoolSize":250,
  "MaxInProcessPerConnection":150,
  "ReconnectionAttempts":3,
  "ReconnectionBaseDelayInMilliseconds":1000
}
 var connectionPoolSettings = new Action<ConnectionPoolSettings>(
               c =>
               {
                   c.PoolSize = connectionPoolSettingsConfigValues.PoolSize;
                   c.MaxInProcessPerConnection = connectionPoolSettingsConfigValues.MaxInProcessPerConnection;
                   c.ReconnectionAttempts = connectionPoolSettingsConfigValues.ReconnectionAttempts;
                   c.ReconnectionBaseDelay =
                       TimeSpan.FromMilliseconds(
                           connectionPoolSettingsConfigValues.ReconnectionBaseDelayInMilliseconds);
               });
           _g = g
                .ConfigureEnvironment(env => env
                   .UseModel(GraphModel
                       .FromBaseTypes<IVertex, IEdge>(lookup => lookup
                           .IncludeAssembliesOfBaseTypes())
                       .ConfigureProperties(model => model
                           .ConfigureElement<Vertex>(conf => conf
                               .IgnoreOnUpdate(x => x.BucketNo))))
                )
                .UseCosmosDb(configurator => configurator
                   .At(new Uri(graphDbConfiguration.Uri))
                   .OnDatabase(graphDbConfiguration.Database)
                   .OnGraph(graphDbConfiguration.GraphName)
                   .AuthenticateBy(graphDbConfiguration.AuthKey)
                   .ConfigureWebSocket(_ => _
                       .ConfigureGremlinClient(client => client
                           .ObserveResultStatusAttributes((requestMessage, statusAttributes) =>
                           {
                               if (debugMode)
                               {
                                   LogGremlinQuery(requestMessage, statusAttributes);
                               }
                           })
                       ).ConfigureConnectionPool(connectionPoolSettings)
                   ));

We downgraded app to .NET core 3.1 and it works fine.

Sure it will help or not.

Thank you !

I'm not sure I got the issue right....you configured the pool to have 250 connections, which are spawned, and it reaches a connection threshold of Azure Functions? Why not set the pool size down to lets say 4 or 8?

Hi @danielcweber we faced some issue like this in our azure function.

All 4 connections have reached their MaxInProcessPerConnection limit of 32. Consider increasing either the PoolSize or the MaxInProcessPerConnection limit

as mentioned on this issue

JonasSyrstad/Stardust.Paradox#15 (comment)

we did some research and settled down on this.

{
  "PoolSize":250,
  "MaxInProcessPerConnection":150,
  "ReconnectionAttempts":3,
  "ReconnectionBaseDelayInMilliseconds":1000
}

our exram gremlin client is singleton, this is how it should be right ?

I see in above issue they have mentioned they moved from scoped to transient?

250 connections in the pool is a lot, and most definietly the reason you hit the Azure Functions Threshold. Plus, you configured a maximum queue size of 150 requests per connection... does your application really expect to have 37500 requests in parallel?

I really don't know what to tackle here. The configuration doesn't seem too reasonable, unless you really expect that number of concurrent requests...I don't know what you are working on. Also, I don't think CosmosDb (assuming that's being used) will let you get away with 250 concurrent connections.

Why not configure sensible amounts of a pool of 8 with a queue size of 32 and see what happens.

Hi @danielcweber Thank you ! I will try different configuration and I will keep you updated on this.
But for some reason it works on .NET Core 3.1 with v8 but not on .NET 6 with v9.
also I will try to run v8 on .NET 6 and will add more details here.

Thank you !

The .NET 6-constraint probably determines the Azure-Functions-environment, one of which has different limits of outgoing connections. That's, however, just a guess.

Hi @danielcweber I tired v8 on .NET 6 (Azure function v4 ) with configuration

{
  "PoolSize":250,
  "MaxInProcessPerConnection":150,
  "ReconnectionAttempts":3,
  "ReconnectionBaseDelayInMilliseconds":1000
}

we faced same issue i.e. to may connection.

after that I updated comigration to be like this

{
"PoolSize":30,
"MaxInProcessPerConnection":10,
"ReconnectionAttempts":3,
"ReconnectionBaseDelayInMilliseconds":500
}

It worked perfectly fine so I guess some changes on azure function side as you guessed.

ExramOnV4

here in graph spike is when running first config and second one is with second configuration. I will contact azure support on this for more details and then will update here also.

Thank you !

cc @cloudbloqavi

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.