Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deployment stack stuck in "Deploying" state

pjelar opened this issue · comments

Describe the bug
I'm rolling out changes to our subscription using deployment stacks. After my initial deployment of a stack and decided to make some changes of the network layer. The next deployment stack then failed, I think because some of the infrastructure couldn't be updated in situ but needed to be deleted before rolling out. The deployment failed but the Deployment stack state stayed in 'Deploying' and I can't progress.
To Reproduce
Steps to reproduce the behavior:

  1. Deploy a virtual network
  2. Make changes to the bicep file with your virtual network and deploy again
  3. The deployment stack will go into a neverending state of Deploying
  4. The deployment will return as failed

Expected behavior
Expect the entire deployment stack and deployment to fail.

Screenshots
Screenshot 2024-01-29 at 11 34 51
Screenshot 2024-01-29 at 11 35 07

Repro Environment
Host OS: Ubuntu 20
Azure CLI Version: 2.230.0

Server Debugging Information
Correlation ID: aa2d5130-cb23-40a4-b82d-959e82418eff
Subscription ID: a4dd8d99-3cfd-46b0-b0e6-1955e327b228
Timestamp of issue (please include time zone): 28/01/2024, 21:02:25
Data Center (eg, West Central US, West Europe): Norway East

Additional context
ERROR: (DeploymentStackInNonTerminalState) The deployment stack resource '/subscriptions/a4dd8d99-3cfd-46b0-b0e6-1955e327b228/resourceGroups/cryptocust-RG/providers/Microsoft.Resources/deploymentStacks/layer-0' could not be updated as it is currently in a non-terminal state 'Deploying'.
Code: DeploymentStackInNonTerminalState

To be clear I attempted to clear the original deployment by wiping the resource group I ran it in and then deployed again but ended up with the same problem.

The old correlation id is: 7bf2bef5-3377-44f9-8f4d-e8cefc381258

Hi @pjelar,

I believe the issue that results in the stack getting stuck in 'Deploying' is caused by an error during deployment (marked as "Conflict" in the provided screenshot):

{
  "error": {
    "code": "DeploymentActive",
    "message": "Unable to edit or replace deployment 'aks-udr-norwayeast': previous deployment from '1/28/2024 12:23:15 AM' is still active (expiration time is '2/4/2024 12:23:14 AM'). Please see https://aka.ms/arm-deploy-resources for usage details."
  }
}

The above is from the old correlation id and the same error type is present in the other correlation id.

The reason it gets stuck in 'Deploying' is because of a bug on our side when attempting to retrieve error details on the deployment. It will attempt to get the related errors of resources in the deployment and nested deployments. Typically, the resource id of an error is not the deployment itself, but in the case of this "DeploymentActive" error code, it is itself and thus causes a hang.

It will require a patch on our side to handle this situation when a deployment with the same id is already running.

Are you able to try deploying the stack again after verifying all deployments within the stack are not in a running state (e.g. 'Deploying', etc) until a fix is deployed?

I'm in a bit of a chicken and egg that I don't have access to check anything from the cli and the azure portal isn't showing me any deployments after I wiped the resource group. The ones I did see were stuck for over 24hrs so neverending.

If there are stuck deployments or stacks, try cancelling first where applicable.

If it is the case that there are no related deployments with the same name as deployments within the stack, then it sounds like there could be deployments in the stack template itself that could be clashing with each other via name.

Try checking all Bicep module names and Bicep resources that are deployments in the template for possible overlaps.

For example, one way that this could happen is:

main.bicep

module foo 'foo.bicep' = [for i in range(0, 10): {
  name: 'uniqueName${i}'
}]

foo.bicep

module inner 'naming.bicep' = {
  name: 'isThisUnique' // <----- because of the loop in the parent template, this is a problem
}

One way to solve the above example is to pass in a parameter to the "foo" module that is the i index and include that in the name.

If there are several layers of modules, cross check all of them and make sure they have unique names.

Another approach could be to use uniqueString in the names. It is important that the seed passed into the function is the same across deployments so module names/resource names stay the same across future stack deployments.

@pjelar We have made a change to handle the bug that Kyle mentioned; that should roll out over the next week or so.

@pjelar The change has been rolled out to all regions, if you want to try again. Let us know if you have any questions.