ANCM PostStartCheck Failure

Question

ANCM PostStartCheck Failure

AdamRiddick opened this issue 2 years ago · comments

For reference: #41409 - I'm opening a new issue as I can't comment on the other since its been locked.

To summarize, we have a .Net core 3.1 application using the Out Of Process hosting model that is experiencing this issue intermittently due to a timeout during startup.

I am given to understand from #22507 that the process shouldn't be left in a broken state, and should be restarted as it is Out Of Process - however this isn't happening and we need to correct this using a manual restart.

Tagging @jkotalik and @adityamandaleeka from the above two issues.

Aditya Mandaleeka · Answer 1 · Sat Jun 25 2022 04:15:57 GMT+0800 (China Standard Time)

Triage: based on the description it sounds like RFP (or something else) is preventing the app from restarting in this case even though we believe that shouldn't affect out-of-proc. We should check if out-of-proc recycling is also affected by RFP or if something else is buggy.

Deleted user · Answer 2 · Sat Jun 25 2022 04:16:05 GMT+0800 (China Standard Time)

Thanks for contacting us.
We're moving this issue to the .NET 7 Planning milestone for future evaluation / consideration. Because it's not immediately obvious that this is a bug in our framework, we would like to keep this around to collect more feedback, which can later help us determine the impact of it. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

Adam Riddick · Answer 3 · Mon Jun 27 2022 17:15:46 GMT+0800 (China Standard Time)

@adityamandaleeka I see from the bot above this will be considered in future, however this is a real and live problem for us in production that seems increasingly more common - is there anything we can do here?

Aditya Mandaleeka · Answer 4 · Tue Jun 28 2022 00:43:36 GMT+0800 (China Standard Time)

@AdamRiddick you can ignore the bot message for this one... I put it in the milestone so we remember to look into it during the .NET 7 cycle.

Because we don't have logging or other info, we're going to just investigate whether RFP affects the out-of-proc scenario as well (which we don't expect). If it's not RFP, it might be something else in your case that's preventing the app from restarting.

Adam Riddick · Answer 5 · Tue Jun 28 2022 16:11:27 GMT+0800 (China Standard Time)

@adityamandaleeka Thanks for the clarification. Can you tell me what RFP is?

I'm happy to arrange a call to discuss if that will assist.

Hao Kung · Answer 6 · Wed Jun 29 2022 00:15:09 GMT+0800 (China Standard Time)

Rapid Fail protection is a feature in IIS

https://stackoverflow.com/questions/6620616/what-is-meant-by-failure-in-iis-rapid-fail-protection

Hao Kung · Answer 7 · Wed Jun 29 2022 00:16:02 GMT+0800 (China Standard Time)

Sounds like there should be a message in your event log if this is the cause something like:

Application pool 'my-test-application-pool' is being automatically disabled due to a series of failures in the process(es) serving that application pool.

Adam Riddick · Answer 8 · Wed Jun 29 2022 20:49:58 GMT+0800 (China Standard Time)

Hi @HaoK I've sifted through the event logs when this has occurred and we don't see any messages relating to rapid fail protection.

Hao Kung · Answer 9 · Thu Jun 30 2022 05:29:36 GMT+0800 (China Standard Time)

I don't see any weirdness when trying a new app that does the following:

Throws in startup, results in a similar event log entry:

But the app domain is still up, I tried this for many requests

Adding a sleep for 60 minutes results in an eventual startup timeout after 120seconds:

Hao Kung · Answer 10 · Thu Jun 30 2022 06:07:51 GMT+0800 (China Standard Time)

@AdamRiddick since you aren't getting hit by Rapid failure protection, unless you are able to give us some kind of repro that demonstrates the behavior you are seeing, where iis needs to be restarted, there's not much we can do, feel free to open a new issue if you are able to provide a repro to investigate further

Adam Riddick · Answer 11 · Thu Jun 30 2022 20:41:02 GMT+0800 (China Standard Time)

@HaoK To clarify, did the process restart after the timeout? My understanding is it should - that's the situation we are in, and it is not restarting every time.

I appreciate the difficulties here. I'll try and reproduce standalone, are there any debugging options here that can help us understand why it isn't restarting? We've tried the ANCM tracing, but that doesn't appear to tell us (Unless I'm missing it ...)

Adam Riddick · Answer 12 · Fri Jul 01 2022 17:17:59 GMT+0800 (China Standard Time)

@HaoK I've managed to get further with this and it now appears the application is being restarted when required, its just happening consistently due to an issue somewhere else - we're still investigating and will come back if we find evidence it is tied to the ANCM.

Thanks.