openkruise / kruise

Automated management of large-scale applications on Kubernetes (incubating project under CNCF)

Home Page:https://openkruise.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] pub webhooks unexpectedly return error when PUB is NOT FOUND

Spground opened this issue · comments

What happened:

PUB may unexpectedly interrupt Pod gc issued by KCM, which can lead Pod leak if KCM gc did not retry or retry many hours later.

What you expected to happen:

PUB webhooks never interrupt Pod gc.

How to reproduce it (as minimally and precisely as possible):

Delete workload let's say Sts or CloneSet, then Pod will be deleted by KCM gc later. Sometimes, Pod to delete will be leaking there for a lone time.

Anything else we need to know?:

The root cause is we return error when PUB CR is deleted in RetryOnConflict. Related codes is here,

The solution is simple, just check error type as we can , ignore it if it is NotFound error.

Environment:

  • Kruise version:
  • Kubernetes version (use kubectl version):
  • Install details (e.g. helm install args):
  • Others:
commented

@Spground You are right, can you fix the bug?