Logging Backend uninstall should tolerate failed installs

Question

Logging Backend uninstall should tolerate failed installs

ron1 opened this issue a year ago · comments

The Logging Backend v0.11.1 uninstall should tolerate various types of failed installs. I was unable to uninstall a failed install from the web ui. Rather, I had to use kubectl to delete the opniopensearch Custom Resource and then delete the quorum pvcs in order to return the Logging Backend to its initial, uninstalled state.

Dan Bason · Answer 1 · Mon Sep 11 2023 07:07:13 GMT+0800 (China Standard Time)

This will be partially resolved by #1701

The deletion of PVCs is still an open question. By design pvcs are not deleted when the statefulset or pods are deleted to persist data. However where something has failed this may not be desirable. I think the best way forward may be to do a best effort delete on the PVCs if the delete is initiated from the API.

@alexandreLamarre @kralicky thoughts?

Alexandre Lamarre · Answer 2 · Mon Sep 11 2023 07:44:40 GMT+0800 (China Standard Time)

As a middle ground, I think we could have a purge data flag / force uninstall flag in the uninstall API?

I still tend towards the correct default behavior being keeping the PVCs, but maybe it makes sense to change it to delete by default with the introduction of backup/restore

Joe Kralicky · Answer 3 · Mon Sep 11 2023 09:58:09 GMT+0800 (China Standard Time)

IMO we should probably never delete PVCs for the user. Could we have it detect whether PVCs exist or not before installing, and then reuse the existing data?

Alexandre Lamarre · Answer 4 · Mon Sep 11 2023 22:18:07 GMT+0800 (China Standard Time)

Just to clarify, Isn't the issue here something along the lines of the opni-quorum / security setup pushes stateful information about itself to opni-data that cannot be reused after a failed install?

Dan Bason · Answer 5 · Tue Sep 12 2023 04:06:28 GMT+0800 (China Standard Time)

Correct, the controlplane nodes contain data about the cluster IDs and cluster state. When a cluster is first installed it bootstraps that information, so the existing data won't match the bootstrapped cluster which causes the issues.