Add support for force deleting VM when its provisioning state is FAILED
unmarshall opened this issue · comments
What would you like to be added:
In Azure if a Virtual Machine has ProvisionState
set to Failed
then it neither be updated or deleted. In this case the VM is stuck in this state. If the associated resources (NIC, OSDisk and DataDisk) have to be updated to set cascade delete then that will fail as in this state the VM updates are not allowed. Azure will return the following:
E1121 11:07:51.116477 26301 machine_util.go:1242] Error while deleting machine --REDACTED--: machine codes error: code = [Internal] message = [Failed to update cascade delete of associated resources for VM: [ResourceGroup: --REDACTED--, Name: --REDACTED--], Err: PATCH https://management.azure.com/subscriptions/--REDACTED--/resourceGroups/--REDACTED--/providers/Microsoft.Compute/virtualMachines/--REDACTED--
--------------------------------------------------------------------------------
RESPONSE 409: 409 Conflict
ERROR CODE: OperationNotAllowed
--------------------------------------------------------------------------------
{
"error": {
"code": "OperationNotAllowed",
"message": "Operation 'Update VM' is not allowed on VM '--REDACTED--' since the VM is marked for deletion. You can only retry the Delete operation (or wait for an ongoing one to complete)."
}
}
--------------------------------------------------------------------------------
]
In these situations, the VM should be deleted, followed by explicit deletion of all associated resources (NIC, OSDisk and DataDisk(s)).
Why is this needed:
This ensures that VM and its associated resources are cleaned up properly.
We have seen multiple issues in Canary [Issue #4358, #4389, #4390, #4377] where VM's were stuck with ProvisioningState = Failed
for days and nothing could be done to clean them up. Operators would have to manually go and issue delete for the VMs. With this issue we attempt to clean up all resources automatically.
/close as fixed
Patch PR is raised as well #120