open-telemetry / opentelemetry-dotnet-instrumentation

OpenTelemetry .NET Automatic Instrumentation

Home Page:https://opentelemetry.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intermittent Telemetry Logging and Absence of OpenTelemetry Instances in .NET 6 Applications Post-Deployment

SSPAGNAMN76 opened this issue · comments

Bug Report

Symptom

Describe the bug
After deployment, telemetry logging in our .NET 6 projects using OpenTelemetry only functions for a limited time span. The logging ceases without error warnings, and a memory dump reveals that instances of OpenTelemetry vanish until a connection pool refresh.

Expected behavior
Continuous and uninterrupted telemetry logging with OpenTelemetry instances remaining persistent without requiring a pool refresh.

Runtime environment (please complete the following information):

  • OpenTelemetry Automatic Instrumentation version: 1.6.0
  • OS: Windows
  • .NET version: .NET Framework 6.0 + Language C#

Additional context
Potential issue with the Garbage Collector prematurely deallocating the MeterProvider instance, interrupting the telemetry logging. Deployed applications utilizing .NET 6 with slightly varied OpenTelemetry initiation codes are being impacted.

Reproduce

Steps to reproduce the behavior:

  1. Deploy a .NET 6 application utilizing OpenTelemetry for telemetry logging.
  2. After a short time span, observe ceasing of telemetry logging without error messages.
  3. Perform a memory dump and note the absence of OpenTelemetry instances.
  4. Refresh the connection pool, note the reappearance of OpenTelemetry instances, and resume of telemetry logging.

Additional Steps Taken:

Enabled logging with the command:

setx OTEL_LOG_LEVEL "debug" /M

Yet, no logs are recorded in %ProgramData%\OpenTelemetry .NET AutoInstrumentation\logs.

Implemented a static field, Provider, to retain the MeterProvider instance and utilized GC.KeepAlive to prevent early collection by the GC:

private static MeterProvider Provider;

// Configuration logic...
Provider = Sdk.CreateMeterProviderBuilder()
    // Additional configuration logic...
   .Build();

I added GC KeepAlive

// Further configuration logic...
GC.KeepAlive(Provider);

that replaces the previous logic:

services.AddSingleton(meterProvider);

This temporary solution seems to initially work, but further validation in the production environment is necessary and better or alternative strategies are sought to avoid potential memory issues in the long-running applications.
Any insights, alternatives, or optimizations that might offer a more stable solution to this issue would be highly appreciated. Also, any additional steps or locations to explore for troubleshooting, considering the missing logs, would be beneficial.

Configuration class code

Here I provide you the code of the configuration class before applying changes described above :

using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using OpenTelemetry;
using OpenTelemetry.Exporter;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using System;
using System.Collections.Generic;
using System.Diagnostics.Metrics;
using System.Reflection;
using System.Threading;

namespace MyProject.Metrics;

public static class MetricsConfigurator
{
    private static readonly AsyncLocal<MetricStreamConfiguration> CurrentCallConfiguration = new();

    public static MeterProvider Configure(IConfiguration configuration, Assembly mainAssembly)
    {
        var appName = mainAssembly.GetName();
        var resource = ResourceBuilder.CreateDefault()
            .AddService(serviceName: appName.Name, serviceVersion: appName.Version!.ToString())
            .AddAttributes(new KeyValuePair<string, object>[]
            {
                new("server.name", Environment.MachineName),
                new("process.id", Environment.ProcessId)
            });

        var exportInterval = TimeSpan.FromMilliseconds(configuration.GetValue("OpenTelemetry:ExportIntervalMilliseconds", 60000));

        return Sdk.CreateMeterProviderBuilder()
            .AddMeter("*")
            .SetResourceBuilder(resource)
            .AddView(instrument =>
            {
                var config = CurrentCallConfiguration.Value;
                CurrentCallConfiguration.Value = null;
                return config;
            })
            .AddOtlpExporter((eo, mo) =>
            {
                eo.Endpoint = new Uri("http://localhost:5110");
                eo.Protocol = OtlpExportProtocol.Grpc;
                mo.PeriodicExportingMetricReaderOptions = new PeriodicExportingMetricReaderOptions
                {
                    ExportIntervalMilliseconds = (int)exportInterval.TotalMilliseconds
                };
                mo.TemporalityPreference = MetricReaderTemporalityPreference.Delta;
            }).Build();
    }

    public static void AddOpenTelemetry(this IServiceCollection services, IConfiguration configuration)
    {
        var meterProvider = Configure(configuration, Assembly.GetCallingAssembly());
        services.AddSingleton(meterProvider);
    }

    public static Meter UsingConfiguration(this Meter meter, MetricStreamConfiguration configuration)
    {
        CurrentCallConfiguration.Value = configuration;
        return meter;
    }
}

The class is referenced in Program.cs written using Minimal Hosting / Minimal Apis approach :

// ......
builder.Services.AddOpenTelemetry(builder.Configuration);
// ......

The issue with missing ILogger has been resolved in version 1.0.2. Could you please test with this new version?
If you find that traces and metrics are missing, please upload the log files for further analysis.

Additionally, if you plan to use auto-instrumentation, there's no need to set up the OpenTelemetry SDK and its configuration in your code.

To collect logs from auto-instrumentation, you need to set the OTEL_DOTNET_AUTO_LOG_DIRECTORY environment variable. For more details, refer to this troubleshooting guide.

The issue with missing ILogger has been resolved in version 1.0.2. Could you please test with this new version? If you find that traces and metrics are missing, please upload the log files for further analysis.

Additionally, if you plan to use auto-instrumentation, there's no need to set up the OpenTelemetry SDK and its configuration in your code.

To collect logs from auto-instrumentation, you need to set the OTEL_DOTNET_AUTO_LOG_DIRECTORY environment variable. For more details, refer to this troubleshooting guide.

Thank you for your answer. Sorry, but I can't understand what version 1.0.2 means. I am using version 1.6 of OpenTelemetry. Can you explain me?

I think that you have wrongly reported this issue and you are using directly OTel .NET SDK instead of automatic instrumentation.

OTEL_LOG_LEVEL is part of Automatic Instrumentation.

If you want to enable logs for OTel SDK please check https://github.com/open-telemetry/opentelemetry-dotnet/blob/3e885c77f201daebf5a4c00109425296d0064b06/src/OpenTelemetry/README.md#self-diagnostics

Also, this issue should be tracked in https://github.com/open-telemetry/opentelemetry-dotnet

I think that you have wrongly reported this issue and you are using directly OTel .NET SDK instead of automatic instrumentation.

OTEL_LOG_LEVEL is part of Automatic Instrumentation.

If you want to enable logs for OTel SDK please check https://github.com/open-telemetry/opentelemetry-dotnet/blob/3e885c77f201daebf5a4c00109425296d0064b06/src/OpenTelemetry/README.md#self-diagnostics

Also, this issue should be tracked in https://github.com/open-telemetry/opentelemetry-dotnet

Hi guys.
Enabling logs I found the following error :

2023-10-05T19:27:09.5341587Z:Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{http://localhost:5110/}{Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: The request was aborted. IOException: The response ended prematurely while waiting for the next frame from the server.", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.")
 ---> System.Net.Http.HttpRequestException: An error occurred while sending the request.
 ---> System.IO.IOException: The request was aborted.
 ---> System.IO.IOException: The response ended prematurely while waiting for the next frame from the server.
   at System.Net.Http.Http2Connection.<ReadFrameAsync>g__ThrowMissingFrame|57_1()
   at System.Net.Http.Http2Connection.ReadFrameAsync(Boolean initialFrame)
   at System.Net.Http.Http2Connection.ProcessIncomingFramesAsync()
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException)
   at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState()
   at System.Net.Http.Http2Connection.Http2Stream.TryEnsureHeaders()
   at System.Net.Http.Http2Connection.Http2Stream.ReadResponseHeadersAsync(CancellationToken cancellationToken)
   at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout)
   --- End of inner exception stack trace ---
   at Grpc.Net.Client.Internal.HttpClientCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
   at Grpc.Core.Interceptors.InterceptingCallInvoker.<BlockingUnaryCall>b__3_0[TRequest,TResponse](TRequest req, ClientInterceptorContext`2 ctx)
   at Grpc.Core.ClientBase.ClientBaseConfiguration.ClientBaseConfigurationInterceptor.BlockingUnaryCall[TRequest,TResponse](TRequest request, ClientInterceptorContext`2 context, BlockingUnaryCallContinuation`2 continuation)
   at Grpc.Core.Interceptors.InterceptingCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
   at OpenTelemetry.Proto.Collector.Metrics.V1.MetricsService.MetricsServiceClient.Export(ExportMetricsServiceRequest request, CallOptions options)
   at OpenTelemetry.Proto.Collector.Metrics.V1.MetricsService.MetricsServiceClient.Export(ExportMetricsServiceRequest request, Metadata headers, Nullable`1 deadline, CancellationToken cancellationToken)
   at OpenTelemetry.Exporter.OpenTelemetryProtocol.Implementation.ExportClient.OtlpGrpcMetricsExportClient.SendExportRequest(ExportMetricsServiceRequest request, CancellationToken cancellationToken)}

How can I solve it?
Thanks a lot.

This repo is for OpenTelemetry dotnet auto-instrumentation. You're directly using the SDK from https://github.com/open-telemetry/opentelemetry-dotnet. Please post your issue in other repo. Thanks!