Skip checking resources when `--wait=false` is specified
firgavin opened this issue · comments
What steps did you take:
I currently use Kapp as a CI tool to manage lots of YAML files. I used --wait=false
when I deleted the app because sometimes deleting custom resources will take a long time.
What happened:
kapp exits with non-zero code which makes CI fail.
$ kapp delete -a app1 --wait=false -y
Target cluster 'https://127.0.0.1:6443' (nodes: firgavin)
Changes
Namespace Name Kind Age Op Op st. Wait to Rs Ri
default simple-app Deployment 22s delete - - ok -
^ simple-app Service 22s delete - - ok -
Op: 0 create, 2 delete, 0 update, 0 noop, 0 exists
Wait to: 0 reconcile, 0 delete, 2 noop
11:18:59AM: ---- applying 2 changes [0/2 done] ----
11:18:59AM: delete deployment/simple-app (apps/v1) namespace: default
11:18:59AM: delete service/simple-app (v1) namespace: default
11:18:59AM: ---- waiting on 2 changes [0/2 done] ----
11:18:59AM: ok: noop service/simple-app (v1) namespace: default
11:18:59AM: ok: noop deployment/simple-app (apps/v1) namespace: default
11:18:59AM: ---- applying complete [2/2 done] ----
11:18:59AM: ---- waiting complete [2/2 done] ----
kapp: Error: Expected all resources to be gone, but found: endpointslice/simple-app-vp2dw (discovery.k8s.io/v1) namespace: default, pod/simple-app-64c66864f5-g9sb8 (v1) namespace: default, replicaset/simple-app-64c66864f5 (apps/v1) namespace: default
What did you expect:
Kapp could skip checking resources when --wait=false
is specified.
Anything else you would like to add:
I did some research and I found that kapp checks the existence of related resources after applying changes. But resources will be deleted eventually. See https://github.com/vmware-tanzu/carvel-kapp/blob/v0.52.0/pkg/kapp/cmd/app/delete.go#L159.
It would be great if kapp could default to skipping checking resources when --wait=false
is specified or add a flag to control this logic. And if that makes sense, I'd like to help implement this ;)
Environment:
- kapp version (use
kapp --version
): v0.52.0 - OS (e.g. from
/etc/os-release
): Ubuntu 20.04.4 LTS - Kubernetes version (use
kubectl version
): v1.23.6+k3s1
Vote on this request
This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.
👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"
We are also happy to receive and review Pull Requests if you want to help working on this issue.
Yeah, it seems like setting the wait flag to false would currently lead to an error while deleting recorded apps. So definitely it's a bug.
It would be great if kapp could default to skipping checking resources when --wait=false is specified or add a flag to control this logic.
It does makes sense to allow that behaviour, I am just trying to think of any side effects it could have. One obvious thing that could happen is that one or more resources are not deleted but the app itself (metadata configmap) is deleted.
@cppforlife Any thoughts?
And if that makes sense, I'd like to help implement this ;)
That would be great, we will definitely review it on priority once we finalize the approach :)
Hey @firgavin good to see your here. Looking forward to your PR for this issue.
One obvious thing that could happen is that one or more resources are not deleted but the app itself
This would be a "known risk" I guess?
We might also lose out on some "retryable cases", where kapp
would retry in case of a failed delete due to a retryable error.
I did some research and I found that kapp checks the existence of related resources after applying changes. But resources will be deleted eventually.
i think additional flag would be reasonable to disable this check. may be under dangerous?
i think additional flag would be reasonable to disable this check. may be under dangerous?
This approach makes sense to me
Hi @cppforlife, @100mik, @praveenrewar - Thanks for your insights! Here's my proposal:
We can add a flag --dangerous-disable-checking-app-deletion
to enable or disable the check:
- The value is set to
false
by default, which is compatible with the current behavior. - Once the flag is specified, kapp skips this check, and users might need to manually delete related resources.
Before I work on it, I'd like to discuss the interaction between the two flags. When --dangerous-disable-checking-app-deletion=false
, should we make sure that the value of --wait
is overwritten to True
? If not, users can still hit the same issue. Of course, we can explain the usage in the docs if we think they should be "orthogonal". Any suggestions?
When --dangerous-disable-checking-app-deletion=false, should we make sure that the value of --wait is overwritten to True?
I think that we should keep the working of these 2 flags independent of each other because a user should be able to use --dangerous-disable-checking-app-deletion
irrespective of --wait
being enabled or disabled.
If not, users can still hit the same issue. Of course, we can explain the usage in the docs if we think they should be "orthogonal". Any suggestions?
Maybe we can add a hint in the error message?