microsoft / CLRInstrumentationEngine

The CLR Instrumentation Engine is a cooperation profiler that allows running multiple profiling extensions in the same process.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure App Services CIE Preinstalled SiteExtension 1.0.39 Has DLLs From 1.0.29

m0nkey653 opened this issue · comments

commented

We noticed using the latest CIE version (1.0.39) in Azure, there was a bad instruction emitted that looks like it was corrected in #194. After looking deeper into the preinstalled site extensions in Kudu, it looks like the DLLs in the CIE 1.0.39 folder are the same version/builds as the DLLs in the CIE 1.0.29 folder (which look to have been built in 2019).

D:\Program Files (x86)\SiteExtensions\InstrumentationEngine\1.0.39\MicrosoftInstrumentationEngine_x64.dll
File Version: 15.1.0.12777
Product version: 15.1.0.2019111401

D:\Program Files (x86)\SiteExtensions\InstrumentationEngine\1.0.29\MicrosoftInstrumentationEngine_x64.dll
File Version: 15.1.0.12777
Product version: 15.1.0.2019111401

The full CIE site extension folder from Azure App Services is attached below - D:\Program Files (x86)\SiteExtensions\InstrumentationEngine
InstrumentationEngine.zip

commented

If there's another team I should report this to, please let me know. Thanks!

Hi @m0nkey653, yes unfortunately this is due to a bug in version 1.0.38 that cause App Service to mitigate by repackaging a previous version as 1.0.39 in a "rolling forward" upgrade which conflicts with the actual 1.0.39 version.

We plan to ship a new version to App Service soon that will be greater than 1.0.39.

Is there seriously no update to this issue? Nothing? Not even a workaround? Our application crashed over 40 times in the last week, and all the stack traces point to InstrumentationEngine DLL. How are we supposed to operate a proper Azure App service when it crashes all the time due to the logging system? We need that logging information!

Please can you at least offer a timeline for a fix or solution?

Hi @mellamokb, I understand your frustration and I apologize for not providing updates in a timely manner. We have provided the App Service team a new build (v1.0.43) around December last year, however due to delays and changes in the deployment process, we are still about a month away from getting it available across all regions.

With that being said, CLRIE is just a thin ICorProfiler wrapper and the more interesting behaviors are provided by the profilers running on top of it. Can you provide more details around your scenario such as what stack traces do you see and what logging system are you using? You can either create GitHub issues or email us directly at clrieowners@microsoft.com if information is sensitive and we'll be able to more immediately assist you.

@mellamokb, appreciate you sending out an email, but it looks like our responses are getting blocked so I'll reply on this thread.

For #1 – we have a TestInstrumentationEngine.zip produced in our builds that overrides the preinstalled extension and can be patched in Azure App Service for testing.

From the CI build off of the v1.0.43 commit: Pipelines - Run 2022030401 (azure.com)
I've shared directly to your email via this link: OneDrive

Note that these are unsigned so use with caution. If you require signed assemblies, we can certainly provide you with them.

To patch, simply drag-and-unzip via Kudu’s DebugConsole to D:\Home\SiteExtensions\ folder (create it if not exist) and restart the app. The COR_PROFILER_PATH environment variables should now point to this path for InstrumentationEngine.dll

image

For #2, we ideally should be supporting everything in .NET so if/when you do encounter issues or patterns, please let us know as we are very interested in bugfix gaps.

Thanks,
William

Thanks for your help! I verified with our IT dept. that we block all out-of-US e-mail server/relays so maybe that's why?

I have tried my best to implement suggested fix in a staging App service.

  1. COR_PROFILER_PATH environment variable never gets set (in Kudu), with or without the override.
  2. After dropping the zip in SiteExtensions, ApplicationInsights failed to load first time, but uninstall/reinstall fixed. How can I tell it's using the correct version of InstrumentationEngine?
  3. When I access the /ApplicationInsights url provided by App Insights it says "InstrumentationEngineLoaded False". Is that normal?

There is so little insight into all of this. Without being able to verify conclusively that the correct version is being used, I'm uncomfortable trying this change into our production service.

Edit: The only thing I can find, is under the Process Explorer, the Environment variables for w3wp shows this in the test service. It appears to still be using the old native path rather than the overridden folder if this is relevant.

MicrosoftInstrumentationEngine_LatestPath: D:\Program Files (x86)\SiteExtensions\InstrumentationEngine\1.0.39

@mellamokb, Absolutely a good idea to use a test App Service first to ensure things are working. From the symptoms you're describing, it sounds like InstrumentationEngine might not be running. Let's check some things.

Is ApplicationInsights configured to use InstrumentationEngine?

  • ApplicationInsights will enable InstrumentationEngine (via AppSetting) if either the SQL Commands or Snapshot Debugger - "Show local variables" are enabled.

image

  • Check the InstrumentationEngine_EXTENSION_VERSION AppSetting by going to the Azure Portal > Your App Service > Settings > Configuration > "Application settings" to see if it's set to something like ~1 (meaning it's enabled) or disabled.

  • If it's set to ~1, this just means that AppService will attempt to load applicationHost.xdt file in the latest 1.x version from D:\Program Files (x86)\SiteExtensions\InstrumentationEngine[version]\

image

Which InstrumentationEngine is actually being used?

  • Make sure to restart the app if you've made any changes so environment variables and site extension resolution can refresh.
  • The definitive profiler that the CLR uses is based on the environment variables which are COR_PROFILER_PATH (for Asp.NET framework) and CORECLR_PROFILER_PATH (for Asp.Net Core). Check Kudu > proess explorer > w3wp.exe environment variable for the path variables and see if it is pointing to the location you unzipped under D:\Home\SiteExtensions.

image

Other tidbits

  • All of the magic happens in the applicationHost.xdt files. If you go to Kudu D:\home\LogFiles\Transform, you will see exactly what Kudu does when it reads the xdt files and if any exceptions or errors occur on malformed xdt files.
  • If the environment variable exists but things aren't working, both the CLR runtime and CLRIE will report errors in D:\Home\LogFiles\eventlog.xml (search for <Provider Name=".NET Runtime"/> or <Provider Name="Instrumentation Engine"/>). Errors around named pipes are expected and benign, but let us know if you see any exceptions.

You can ignore microsoftinstrumentationengine_latestpath, that is a built-in environment variable that just points to the latest version path that exists under program files folder and does not take into account the private extensions that exist in D:\Home\SiteExtensions.

Oi, yes I see that I am a bit daft today. I guess it has been so long since I last set it up that I forgot about all those other options being disabled when I turned it back on. Now that I've enabled the profiler, debugger, locals and SQL options I am seeing all the things you've pointed out in the screenshots. Will battle test this week in the test service, then consider rollout to production service next week.

Thanks again for your patience and help in troubleshooting this! Please sincere apologies for my previous tone, you have been most helpful.

I minor thing I've noticed for the App Service team - When all of this enabled, I notice the Modules tab on the w3wp properties under Process Explorer doesn't load and there is a JavaScript error in the console. xml syntax error in /instance/all. I realize this may be out of scope for your team.

Best Regards.

@mellamokb It can be frustrating when things don't work so I'm happy I was able to help! Let us know how your testing goes or if any issues come up either in an email or through GitHub issues and we'll do our best to address them. I'll be sure to pass along your feedback as well to the App Service team.

I'll also resolve this issue once the new 1.0.43 version of CLRIE gets deployed everywhere and I would recommend you remove the private extension if/when that occurs in case we need to provide security patches or bugfixes in the future.

Cheers!

Minor update: there were some more delays from App Service which have unfortunately postponed this for another week.

Version 1.0.43 is currently being rolled out across staging regions and should hit live regions in the next few weeks.

v1.0.43 is now available in the WestCentralUS region.

Given that this rollout is slowly spreading across the public Azure regions, I'll close the issue. Please open up new tickets if any issues arise from the new version. Thanks!