DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM

Home Page:https://docs.datadoghq.com/tracing/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AspNetCore middleware is throwing unhandled exception on shutdown

TechyChap opened this issue · comments

Describe the bug
Several of our instrumented ASPNetCore applications are throwing an exception when K8s is shutting it down. Log shows that this happened in AspNetCore/BlockingMiddleware.cs

System.Threading.Tasks.TaskCanceledException: A task was canceled.
File "/project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs", line 140, in async Task BlockingMiddleware.Invoke(HttpContext context)
File "/project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs", line 140, in async Task BlockingMiddleware.Invoke(HttpContext context) x 2
...
(2 additional frame(s) were not displayed)

Connection id ""0HMSONKB9DNEM"", Request id ""0HMSONKB9DNEM:00000001"": An unhandled exception was thrown by the application.

To Reproduce
Run something that would call the BlockingMiddleware.cs
Shut the application down.

Expected behavior
Application would shutdown without throwing an exception

Screenshots
If applicable, add screenshots to help explain your problem.

Runtime environment (please complete the following information):

  • Instrumentation mode: automatic
  • Tracer version: TracerVersion: "2.37.0.0"
  • OS: Debian GNU/Linux 11 (bullseye)
  • CLR: .Net 6.0.21

Additional context
Add any other context about the problem here.

commented

Hi,
Thank you for reporting this issue, we are looking into it and we'll get back to you shortly.

commented

Hi Andy,

We are trying to reproduce the issue locally. Would you have the complete stack trace of the issue?

Ideally if you could forward us the log files or even a dump file, that would be great. You can do so by creating a corresponding support ticket following this documentation and joining files to it.

Thank you

Not sure how I'd get a dump as this only occurs when the pod is shut down so there would be no pod to connect to, to get a dump.

But I've got some more log examples:

System.Threading.Tasks.TaskCanceledException: A task was canceled. at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func2 predicate, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
at Datadog.Trace.ClrProfiler.AutoInstrumentation.AspNetCore.BlockingMiddleware.Invoke(HttpContext context) in /project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs:line 140
at Microsoft.AspNetCore.Routing.EndpointMiddleware.g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext)
at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider)
at Datadog.Trace.ClrProfiler.AutoInstrumentation.AspNetCore.BlockingMiddleware.Invoke(HttpContext context) in /project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs:line 140
at Datadog.Trace.ClrProfiler.AutoInstrumentation.AspNetCore.BlockingMiddleware.Invoke(HttpContext context) in /project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs:line 140
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication1 application)

and

System.Threading.Tasks.TaskCanceledException: A task was canceled. at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func2 predicate, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
at Datadog.Trace.ClrProfiler.AutoInstrumentation.AspNetCore.BlockingMiddleware.Invoke(HttpContext context) in /project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs:line 140
at Microsoft.AspNetCore.Routing.EndpointMiddleware.g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext)
at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider)
at Datadog.Trace.ClrProfiler.AutoInstrumentation.AspNetCore.BlockingMiddleware.Invoke(HttpContext context) in /project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs:line 140
at Datadog.Trace.ClrProfiler.AutoInstrumentation.AspNetCore.BlockingMiddleware.Invoke(HttpContext context) in /project/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AspNetCore/BlockingMiddleware.cs:line 140
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication1 application)

I'm wondering if this is just that Datadog.Trace is just part of the stack that's collapsing and perhaps not the root cause? Would you agree?

commented

Thank you for the whole stacktrace, yes I would say so given the top of the stacktrace.

I can reproduce it if I register aHealthCheckMiddleware in the stack of middlewares and if I throw an exception within it.
The BlockingMiddleware will necessarily appear in the stacktrace as it is registered as very first and last protecting middleware but it doesn't necessarily mean it is responsible for the exception, as each asp.net core middleware registered in the pipeline is responsible for calling the next one, they will all appear at some point, like the main EndpointMiddleware of Asp.net core.
It is more likely that the last middleware in the stack is at some point not shutting down gracefuly, as could be the HealthCheckMiddleware if it's pinging services that are currently shutting down, but this is just a guess :)

Just to expand a little on what @anna-git said

as could be the HealthCheckMiddleware if it's pinging services that are currently shutting down

Digging in a little further, when the HealthCheckMiddleware executes, it passes the request's cancellation token in to the DefaultHealthCheckService. This cancellation token fires when the request is aborted by the client:

var result = await _healthCheckService.CheckHealthAsync(_healthCheckOptions.Predicate, httpContext.RequestAborted);

The DefaultHealthCheckService checks the status of the token at various points, and throws a TaskCanceledException if the request is cancelled.

These errors are likely benign, and may be due to the health checks taking too long (and so the client is aborting). I have a post that is somewhat related describing how to handle these sorts of cancellation in asp.net core, and there are some useful comments on there about other approaches you might find interesting.

Either way, there's nothing to worry about from the tracing point of view 🙂

commented

Is it ok for you if we go ahead and close this issue @TechyChap ?
Thanks