gardener / gardener-extension-provider-azure

Gardener extension controller for the Azure cloud provider (https://azure.microsoft.com).

Home Page:https://gardener.cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deprecate AvailabilitySet based clusters

dkistner opened this issue · comments

How to categorize this issue?
/area robustness
/kind technical-debt
/priority 2
/platform azure

What would you like to be added:

We are currently using AvailabilitySets to ensure that machines get distributed across compute units for non zonal deployments (primarily in regions which does not consists of multiple zones).

Due to legacy reasons we use just one single AvailabilitySets for all machines in the cluster (even with multiple worker pools). This approach come with several drawbacks (basic load balancer, no different hardware skus etc.).

Therefore we started already a while ago to support also Azure cluster based on VirtualMachineScaleSets with flexible orchestration (VMSS flex/VMO). So far this was just useable as an alpha feature (activated via annotation alpha.azure.provider.extensions.gardener.cloud/vmo=true on Shoot resource) as the feature was not general available on Azure.

As the VirtualMachineScaleSets with flexible orchestration feature now turned GA on Azure we should start and make it the default for non zonal deployments. Ref

This is an umbrella issue to track what need to be done to deprecate (probably first forbid to create new?) AvailabilitySet based clusters and to install VMSS flex as new default deployment model for non-zonal clusters. Goal should be to get rid of the AvailabilitySet deployment model entirely at a certain point in time.

Why is this needed:
More robust/flexible machine distribution support in non zonal regions.

cc @kon-angelo, @MSSedusch, @HappyTobi

Limitations are documented here:

https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#a-comparison-of-flexible-uniform-and-availability-sets

for example:

D series, E series, F series, A series, B series, Intel, AMD; Specialty SKUs (G, H, L, M, N) are not supported

what need to be done to deprecate (probably first forbid to create new?) AvailabilitySet based clusters

For deprecating AVS based clusters, it would be nice to make use of the warning headers in the validation webhook for azure shoots (see https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#response).
I.e. return a warning for users that are creating AVS based clusters.

Status update:
At the moment we cannot remove availability set based deployments entirely as the replacement vmss flex (vmo) is currently lacking support for important machine type series (see here).
We need to wait with the deprecation until this gap is closed on the Azure side.

We might want to consider to make vmss flex (vmo) based clusters the default deployment model for non zonal clusters.

/status blocked