[TASK] Investigate to see how we can improve UX in case the upgrade check failed

Question

[TASK] Investigate to see how we can improve UX in case the upgrade check failed

PhanLe1010 opened this issue 2 months ago · comments

What's the task? Please describe

We have upgrade check logic inside longhorn-manager upgrade path which will prevent the longhorn-manager from coming up if the version check failed or there is an incompatible engine inside the cluster

The issue is that when user install Longhorn using longhorn.yaml manifest (by kubectl apply or some Gitop solution), the new Longhorn resources are already applied. This forces the user to rollback. However, rollback in this case would be very rough because CRDs are already updated. It took me a bit to get out of this situation including hack like removing webhook config inorder to able to edit CRD while longhorn-manager (webhook) pods are not running

One idea from @c3y1huang is to integrate the pre-check into the Longhorn CLI, so user can run it separately before the actual upgrade: #8343

However, we cannot guarantee that every user run Longhorn CLI check before upgrading (more difficulty on the Gitop tools?). Therefore, I am opening this ticket to see if there are additional ideas.

cc @ejweber @james-munson @shuo-wu @derekbit @innobead