microsoft / CLRInstrumentationEngine

The CLR Instrumentation Engine is a cooperation profiler that allows running multiple profiling extensions in the same process.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

certain IIS spawned processes crash whenever COR_ENABLE_PROFILING=1 leaks to descendants

sanikolov opened this issue · comments

This report was collected from a customer site.
The above variable and a few others are set in 2 places

  • W3SVC\Environment
  • WAS\Environment

The former entry was created by New Relic. The latter by us where we stick NR in the raw profiler hook.
In this particular crash report we've disabled New Relic by putting an "x" in front of the 2 raw profiler hook variables.
Now I recognize that child processes like csc.exe and VBCSCompiler.exe are not meant to be "profiled".
They happen to be profiled because COR_ENABLE_PROFILING=1 leaks from the parent process w3wp.exe to
descendants. Nevertheless the expectation is that we should not be seeing crashes in these descendants.
Crashing processes pile up and cause CPU and memory spikes which render the box unusable over time.
A link to a minidump and related files will be sent separately to the clrieowners alias because the collected data
may be potentially sensitive.

Hi @sanikolov, appreciate you providing us the details! We can consider adding a flag that disables propagating the Icorprofiler env variables to child processes. Just to confirm, do you have any scenarios where you do want a selected list of child processes to be profiled?

There are 2 places where an env variable propagation may be turned off - just prior to child process creation and in the parent process. In the 1st case only child processes are affected, the parent continues to have the env var set.
If a child process is created in a native w3wp module then the flag won't work (this is what I believe happens in this particular report).
In this case the env var must be cleared in the parent process.

I would be cautious about globally disabling child process profiling because there may be instrumentation methods that expect to instrument child processes.

@WilliamXieMSFT have you had a chance to look at the dumps? Is the crash occurring in our code, or in 3rd party code?

@delmyers the dump is of vbcscompiler.exe and looks like only CLRIE is in the modules list. My thought is to keep the current behavior of propagation and the flag will be an opt-in disable of propagation to unblock Slavcho.

@WilliamXieMSFT, I would still like to see a root cause of this because CLRIE has been running in production for a long time where vbcscompiler would have been getting called from IIS. The problem might be in our raw profiler hook code, in which case we should probably fix the cause rather than the symptom.

I'm reluctant to turn off child process profiling because that kind of breaks the contract of allowing multiple instrumentation methods run at the same time. One controller is deciding for all clients how they will behave.

@sanikolov, would you be able to try with a newer version of the msi? The dump file indicates v1.0.36 is used, but it doesn't match our symbols.

Ok, I will try to collect a better dump using your MSI in the next few days. The reason the version doesn't match is because I was building from source and packaging the CLRIE binaries into our MSI, until I became aware that you've started publishing MSIs not too long ago.

@sanikolov, alternatively, you could upload the pdbs produced from your build that are associated with the dump you provided, and tell us the commit/branch that your build is based off of.

Version 1.0.42 of CLRIE appears to have fixed the crashes described above.
It is unknown which of the fixes that went in CLRIE between 1.0.36 and 1.0.42 has taken care of this issue.

Thanks for the confirmation @sanikolov