[OpinionatedWatcher] Delete events are not received when Add fails or is still retrying
IfSentient opened this issue · comments
If an Add
function call fails or is still in retry when a Delete
event comes in from the informer, the OpinionatedWatcher
doesn't forward that delete to the underlying watcher function. This is not desired behavior, but has no simple fix due to the following:
- The
OpinionatedWatcher
naively assumes that operations are atomic--if anAdd
fails, there's nothing to clean up - It makes this assumption because it makes its job easier: otherwise, we have to contend with double-deletes. The sequence of events that lead to a typical delete are as follows:
- A user creates a resource
- The watcher receives as
Add
event and calls the underlying watcher'sAdd
function. On success, it adds a finalizer to the resource - A user deletes the resource
- The watcher receives as
Update
event wheredeletionTimestamp
is non-nil. This indicates that the resource has been deleted, but will stick around (similar to tombstoning) until all elements are removed from thefinalizers
list. - The watcher calls the underlying watcher's
Delete
function. On success, it removes itself form thefinalizers
list - When all finalizers are removed, the watcher receives a
Delete
event for the object. It discards this without doing anything, as the delete should have already been handled.
However, as we can see, if the underlying Add
event fails and is in retry (or fails altogether with no retries), then the finalizer is never added, and the path the OpinionatedWatcher
expects deletes to come in through will not be taken for that resource.
Why can't we just add the finalizer before we call the underlying add function?
If we do that, then if the call fails and is in retry when the operator restarts, on restart the operator will no longer see it as an add then it syncs via the list call, so the add will never be retried.
Why can't we wait to propagate the delete until after the finalizer is removed (on the actual delete event)?
Two reasons: (1) kubernetes expects that all necessary cleanup for the operator has been done when it removes the finalizer, and (2) if we do that and the subsequent propagated delete call fails and is in retry when the operator restarts, the restarted operator will have no record of the delete action, and will not try it again. The OpinionatedWatcher
is designed specifically to avoid this kind of restart/downtime-related behavior.
Nevertheless, the missing delete on failed/retrying add is still undesired behavior and should be considered a bug, just without an obvious fix at the moment.