DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM

Home Page:https://docs.datadoghq.com/tracing/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Continuing issues with version-mismatched MSI & NuGet packages and mixed custom/automatic instrumentation

jbparker opened this issue · comments

Describe the bug
Creating custom instrumentation alongside automatic instrumentation on versions after 2.x still has issues with version mismatches between the installed tracer msi and the NuGet package.

Specifically, a parent trace/span created with custom instrumentation will sometimes (but not always) log associated child spans outside of the time period of the parent trace/span.

To Reproduce
Steps to reproduce the behavior:

  1. Create a custom active span using Datadog.Trace.Tracer.Instance.StartActive from NuGet package for code whose runtime environment is automatically instrumented with an msi (see versions below)
  2. Ensure that the automatically instrumented code is contained in the parent trace/span and doesn't spawn other threads/use async/have lifetimes that will otherwise outlive the parent trace/span
  3. Run many iterations. Some will have the parent trace/span contain all of the child traces within expected timeframes / some will not

Expected behavior
All parent traces should completely contain associated child spans.

Screenshots
Here we have a custom/manual instrumented parent trace/span whose automatically instrumented child spans are not contained within the correct timeframe of the parent trace/span:

mismatch

Downgrading the NuGet package to match the msi will fix the issue:

matched

Runtime environment:

  • Instrumentation mode: automatic with msi installer along with manual with NuGet package for custom instrumentation
  • Tracer version: 2.21.0 msi, 2.33.0 NuGet package
  • OS: Windows Server 2019
  • CLR: NET Framework 4.8

Additional context
According to #674 (comment), there should not be issues when running different msi & NuGet versions anymore.

This also seems similar to the issue described on #1219 before the type-mismatch was fixed in version 2.x.

In practice, are there still concerns with evolving the versions of these separately?

Is it the first time you're observing this? Have you run with conflicting versions prior to 2.32.0? I suspect one of the changes we introduced in that version could cause this.

This is the first time we've tried to do any custom instrumenting (installing NuGet) after installing automatic instrumentation earlier this year/late last year (thus, the 2.21.0 msi), so I'm not sure if this is expected or not.

That said, I just tried this with the existing msi (2.21.0) and a NuGet package that was a bit closer (2.26.0), and it exhibits a similar issue - the child spans seem bound within the same time period, but they are "stacked":

stacked

Instead of sequential as they should be:

256393713-cc0cdab7-a548-4de8-9286-5b48668f9396

Can we expect that some functionality will degrade if these aren't kept completely in sync?

Thanks @jbparker, we'll look into it!

Can we expect that some functionality will degrade if these aren't kept completely in sync?

Unfortunately, yes. What you're seeing isn't expected, but we do expect to see a performance impact from not keeping these in sync (for example). This is due to the way .NET treats different versions of the same assembly as completely different, so we have to do a lot of extra work to "talk" between them, which is not required if the version numbers are in sync.

That makes complete sense — thanks so much for the details on that!

So, we can expect some reliable interop (vs the previous state where it might have been more largely incompatible), but we should still strive for keeping them in sync.

We'll keep that in mind as we deploy updates out in the future.