DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM

Home Page:https://docs.datadoghq.com/tracing/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bundled Tracer fails to load with .NET 8 and COMPlus_EnableDiagnostics=0

marcovr opened this issue · comments

Describe the bug
While upgrading our applications to .NET 8 we experienced missing Traces and APM metrics. We then found out that the bundled tracer fails to load / attach to the .NET process when running with .NET 8 and having the environment variable COMPlus_EnableDiagnostics=0 set.

To Reproduce
Steps to reproduce the behavior:

  1. Create a new dotnet webapp using .NET 8 with reference to Datadog.Trace.Bundle
  2. Set env variables for Tracer as well as COMPlus_EnableDiagnostics=0
  3. Run application
  4. Tracer is not loaded - Traces are missing!

Have a look at my minimal reproducible repo
As you can see in this run, the bundled tracer is successfully loaded with .NET 7 regardless whether COMPlus_EnableDiagnostics=0 is set or not. But with .NET 8 it fails to load with COMPlus_EnableDiagnostics=0:

[WARNING]: The native loader library is not loaded into the process

Note: the lines "[FAILURE]: Error connecting to Agent" can be ignored - That is expected because I'm not running an agent :-)

Expected behavior
The tracer works with COMPlus_EnableDiagnostics=0

ScreenshotsVisualization

Bundled Tracer works? .NET 7 .NET 8
COMPlus_EnableDiagnostics=1
COMPlus_EnableDiagnostics=0

Runtime environment (please complete the following information):

  • Instrumentation mode: automatic with Datadog.Trace.Bundle
  • Tracer version: 2.44.0
  • OS: Debian GNU/Linux 12 (bookworm)
  • CLR: .NET 8.0.0

Additional context
I only tested this in a containerized environment, but assume that other environments are also affected

I just realized that the linked logs are apparently private, so here is what I get:

✅ .NET 7 COMPlus_EnableDiagnostics=0

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.

---- CONFIGURATION CHECKS -----
1. Checking if tracing is disabled using DD_TRACE_ENABLED.
 [INFO]: DD_TRACE_ENABLED is not set, the default value is true.
2. Checking if profiling is enabled using DD_PROFILING_ENABLED.
 [INFO]: DD_PROFILING_ENABLED is not set, the continuous profiler is disabled.

---- DATADOG AGENT CHECKS -----
Detected agent url: http://127.0.0.1:8126/. Note: this url may be incorrect if 
you configured the application through a configuration file.
Connecting to Agent at endpoint http://127.0.0.1:8126/ using HTTP
 [FAILURE]: Error connecting to Agent at http://127.0.0.1:8126/: Connection 
refused (127.0.0.1:8126)

✅ .NET 7 COMPlus_EnableDiagnostics unset

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.

---- CONFIGURATION CHECKS -----
1. Checking if tracing is disabled using DD_TRACE_ENABLED.
 [INFO]: DD_TRACE_ENABLED is not set, the default value is true.
2. Checking if profiling is enabled using DD_PROFILING_ENABLED.
 [INFO]: DD_PROFILING_ENABLED is not set, the continuous profiler is disabled.

---- DATADOG AGENT CHECKS -----
Detected agent url: http://127.0.0.1:8126/. Note: this url may be incorrect if 
you configured the application through a configuration file.
Connecting to Agent at endpoint http://127.0.0.1:8126/ using HTTP
 [FAILURE]: Error connecting to Agent at http://127.0.0.1:8126/: Connection 
refused (127.0.0.1:8126)

❌ .NET 8 COMPlus_EnableDiagnostics=0

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [WARNING]: The native loader library is not loaded into the process
 [WARNING]: The native tracer library is not loaded into the process
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.
6. Checking if process tracing configuration matches Installer or Bundler:
Installer related documentation: 
https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/dotnet-core?tab
=linux#install-the-tracer
 [FAILURE]: Error trying to check the Linux installer directory: Could not find 
a part of the path '/opt/datadog'.

Note the lines:

[WARNING]: The native loader library is not loaded into the process

✅ .NET 8 COMPlus_EnableDiagnostics unset

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.

---- CONFIGURATION CHECKS -----
1. Checking if tracing is disabled using DD_TRACE_ENABLED.
 [INFO]: DD_TRACE_ENABLED is not set, the default value is true.
2. Checking if profiling is enabled using DD_PROFILING_ENABLED.
 [INFO]: DD_PROFILING_ENABLED is not set, the continuous profiler is disabled.

---- DATADOG AGENT CHECKS -----
Detected agent url: http://127.0.0.1:8126/. Note: this url may be incorrect if 
you configured the application through a configuration file.
Connecting to Agent at endpoint http://127.0.0.1:8126/ using HTTP
 [FAILURE]: Error connecting to Agent at http://127.0.0.1:8126/: Connection 
refused (127.0.0.1:8126)

Hi @marcovr, thanks for flagging this. It appears that this was a behavior change in .NET 8 which disables the profiling APIs we rely on when you set COMPlus_EnableDiagnostics=0.

Unfortunately, as it's in the runtime, there's nothing we can do about it, however they suggest the following workaround:

To emulate previous behavior, I suggest setting the following to ensure the behavior is as intended:

DOTNET_EnableDiagnostics=1
DOTNET_EnableDiagnostics_IPC=0
DOTNET_EnableDiagnostics_Debugger=0
DOTNET_EnableDiagnostics_Profiler=1

Could you give that a try and make sure that fixes your issue? Thanks!

Ohh I see. Not your fault then 😉
I can confirm that your suggestion works as expected.

But maybe it could be worth adding a note in the Readme / setup documentation about this change?
I spent quite a while trying to figure out what was causing the issue

Yep, makes sense - will look at getting that added somewhere - thanks! 🙂

Having DOTNET_EnableDiagnostics=1 though prevents the use of read-only root filesystem for dotnet containers. Any work-around for having dotnet, read-only root filesystem, AND datadog tracing all at the same time?

@nwesoccer that was the reason why we had originally set it to 0 as well 😄

But if you set all the following environment variables, it does indeed work with a readonly filesystem because no IPC files are written:

DOTNET_EnableDiagnostics=1
DOTNET_EnableDiagnostics_IPC=0
DOTNET_EnableDiagnostics_Debugger=0
DOTNET_EnableDiagnostics_Profiler=1

See the corresponding docs

@marcovr I'm sorry, my test scenario was dotnet 7 as we have projects with both 7 and 8. I suppose that means for dotnet 7 we'll need DOTNET_EnableDiagnostics=0 (since the above list doesn't work with dotnet 7 and read-only) and for dotnet 8 the above mentioned list does work for dotnet8 and read-only?

Yes, exactly. We solved this by building customized base images where depending on the .NET version, a different set of variables is set.

@marcovr Makes sense, Thanks!!

Just FYI, we've added detection of this scenario to the dd-dotnet diagnostic tool

Out of interest though @marcovr/@nwesoccer why are you setting COMPlus_EnableDiagnostics=0/DOTNET_EnableDiagnostics=0 🤔

Cool, thanks 🙂

The reason why we had set COMPlus_EnableDiagnostics=0 was to run our application in containers with a read only file system.

When running any .NET application on a read only file system without this variable set, the runtime fails to start and produces the following output:

Failed to create CoreCLR, HRESULT: 0x8007000E

I suppose this happens because the runtime fails creating the debug pipes.

Interestingly though, this appears to have been fixed with .NET 8.