dragonflydb / dragonfly-operator

A Kubernetes operator to install and manage Dragonfly instances.

Home Page:https://www.dragonflydb.io/docs/managing-dragonfly/operator/installation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

controller stuck when rollout fails

Jean-Daniel opened this issue · comments

When pushing a change on a DragonFly resource, a rollout, if an other update is pushed, the controller will wait until first rollout is done before applying any change.

This is an issue, as if the first change contains a typo (invalid image url for instance), there is no way to fix it, as the change with the right image url will never be applied.

To reproduce:

  • deploy an operator and a DragonFly instance with 2 replicas.
  • update the dragonfly instance and specify an image with an non existing tag.
  • push an other change with a good tag.

The controller wait for the first change to be fully applied but it never occurs as the pod are failing to start with ImagePullError.

This is a good find! The fix would be to give up on the roll out if the first deleted pod isn't coming back and then accept more updates! @Jean-Daniel Do you want to take it up?

I've just found this situation modifying dragonfly resource with not enough memory. Operator it's not able to rollback nor interrupt the current rollout with a new one applying valid settings.