microsoft / CLRInstrumentationEngine

The CLR Instrumentation Engine is a cooperation profiler that allows running multiple profiling extensions in the same process.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLRIE appears to be incompatible with 1 or more MS Exchange services (services crash)

sanikolov opened this issue · comments

OS: windows server 2016 or 2019
Microsoft Exchange Server version: 2016 or 2019, e.g. Cumulative Update 9
Process that I profiled: MSExchangeHMHost.exe but other MSEchange*.exe services are crashing too
.NET Framework version = 4.0.30319 but I don't think this matters

No other profilers are loaded as you can see from this message, meaning that the CLRIE profiler is the only thing loaded and it's not configured to instrument anything. The mere presence of the CLRIE profiler is enough to cause a crash.

No instrumentation method configs found to load in process 18028l
(466c.2090): Break instruction exception - code 80000003 (first chance)
*** WARNING: Unable to verify checksum for C:\Program Files\Microsoft CLR Instrumentation Engine\1.0.36\Instrumentation64\MicrosoftInstrumentationEngine_x64.dll
MicrosoftInstrumentationEngine_x64!GetInstrumentationEngineLogger+0x458e8:
00007fff`eba1e298 cc              int     3

Here is an interesting observation: if you set MicrosoftInstrumentationEngine_DebugWait to 1 and are able to attach to the service swiftly from windbg.exe then when you say (g)o to windbg the service appears to run as expected, no crashes.
Please take a look. I'd imagine that on azure users may want to gather metrics for MS Exchange.

cc @WilliamXieMSFT I believe this should be addressed by #386 and should be fixed with the latest release.

nice, let me try the latest release and confirm or infirm your guess.

Unfortunately, version 39 does not solve the MS Exchange issues.
DebugWait=0 crashes, DebugWait=1 allows me to verify that all variables, path are as expected and does not crash.

00007fff`cfcc0000 00007fff`cfe20000   MicrosoftInstrumentationEngine_x64 C (export symbols)       C:\Program Files\Microsoft CLR Instrumentation Engine\1.0.39\Instrumentation64\MicrosoftInstrumentationEngine_x64.dll
00007fff`d76b0000 00007fff`d777c000   Microsoft_Office_Datacenter_Monitoring_ActiveMonitoring_Recovery_ni   (deferred)             
00007fff`de3c0000 00007fff`def75000   Microsoft_Exchange_Common_ComponentConfig_Transport_ni   (deferred)             
00007fff`def80000 00007fff`df719000   Microsoft_Exchange_Common_Directory_DirectoryVariantConfig_ni   (deferred)             
00007fff`df720000 00007fff`dfab9000   Microsoft_Exchange_Rpc_ni   (deferred)             
00007fff`dffb0000 00007fff`e0010000   Microsoft_Practices_ObjectBuilder2_ni   (deferred)             
00007fff`e0080000 00007fff`e097c000   Microsoft_Exchange_Data_ni   (deferred)             
00007fff`efbf0000 00007fff`efddd000   Microsoft_CSharp_ni   (deferred)             
00007fff`f7200000 00007fff`f7265000   ManagedAvailabilityCrimsonMsg_ni   (deferred)             
00007fff`f8200000 00007fff`f824a000   InstrumentationEngine_ProfilerProxy_x64   (deferred)

Again, none of our DLLs are getting loaded as a test to see how the instrumentation engine will fare on its own.

Hi @sanikolov, would you be able to provide us (clrieowners@microsoft.com) a dump of the crash and also any log files (Errors|Dumps)? This sounds like a tricky bug if debugging causes it to go away.

Indeed it is tricky to debug.
I had time to get deeper into this bug today and tried a couple things, no luck.
First I used DebugDiag to capture a full dump. My rule failed to collect a single dump. See images below.
Second, I enabled Application Verifier and gflags for the purpose of getting windbg started and attached to the service automatically. Never happened, no matter what checkboxes I clicked. See below some screenshots.

These are the variables under Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeHM
subkey Environment

COR_ENABLE_PROFILING=1
COR_PROFILER={21419204-CA6F-464E-BC1D-E4506B9D333F}
COR_PROFILER_PATH_64=C:\Program Files\Microsoft CLR Instrumentation Engine\Proxy\v1\InstrumentationEngine.ProfilerProxy_x64.dll
MicrosoftInstrumentationEngine_DisableCodeSignatureValidation=1
MicrosoftInstrumentationEngine_LogLevel=None
MicrosoftInstrumentationEngine_DebugWait=0

image
image
image
image

It is really weird that none of the above actions resulted in any progress.

btw process explorer shows werfault.exe popping up at the time of the crash as is customary.
So we're dealing with a crash, not a silent exit, in my opinion.

Hi @sanikolov, would it be possible to run procdump to generate a dump file? Also, would you mind setting MicrosoftInstrumentationEngine_LogLevel=Errors|Dumps and MicrosoftInstrumentationEngine_FileLogPath to some folder on disk ([path]\ with backslash) so we can persist the logs for all processes that get profiled?

I tried your suggestions. Nothing came of it.
First I requested system wide catch all for crashing processes.

C:\tmp\Procdump>procdump64.exe -i c:\tmp\crashes -ma

ProcDump v10.0 - Sysinternals process dump utility
Copyright (C) 2009-2020 Mark Russinovich and Andrew Richards
Sysinternals - www.sysinternals.com

Set to:
  HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug
    (REG_SZ) Auto     = 1
    (REG_SZ) Debugger = "C:\tmp\Procdump\procdump64.exe" -accepteula -ma -j "c:\tmp\crashes" %ld %ld %p

ProcDump is now set as the Just-in-time (AeDebug) debugger.

Nothing was generated in folder C:\tmp\crashes\ during several service restarts, each of which crashed.
Subsequently I changed
MicrosoftInstrumentationEngine_LogLevel to Errors and later to Dumps while setting
MicrosoftInstrumentationEngine_FileLogPath to a valid location (which I have done tens of times in the past few months).
Restarted a couple more times.
No logs were generated.
This bug may require special skill or debug environment to resolve, such as a JTAG or something sophisticated like that.
At this point I am gonna have to leave the bug in your capable hands.