rancher / opni

Multi Cluster Observability with AIOps

Home Page:https://opni.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Logging Backend uninstall should tolerate failed installs

ron1 opened this issue · comments

commented

The Logging Backend v0.11.1 uninstall should tolerate various types of failed installs. I was unable to uninstall a failed install from the web ui. Rather, I had to use kubectl to delete the opniopensearch Custom Resource and then delete the quorum pvcs in order to return the Logging Backend to its initial, uninstalled state.

This will be partially resolved by #1701

The deletion of PVCs is still an open question. By design pvcs are not deleted when the statefulset or pods are deleted to persist data. However where something has failed this may not be desirable. I think the best way forward may be to do a best effort delete on the PVCs if the delete is initiated from the API.

@alexandreLamarre @kralicky thoughts?

As a middle ground, I think we could have a purge data flag / force uninstall flag in the uninstall API?

I still tend towards the correct default behavior being keeping the PVCs, but maybe it makes sense to change it to delete by default with the introduction of backup/restore

IMO we should probably never delete PVCs for the user. Could we have it detect whether PVCs exist or not before installing, and then reuse the existing data?

Just to clarify, Isn't the issue here something along the lines of the opni-quorum / security setup pushes stateful information about itself to opni-data that cannot be reused after a failed install?

Correct, the controlplane nodes contain data about the cluster IDs and cluster state. When a cluster is first installed it bootstraps that information, so the existing data won't match the bootstrapped cluster which causes the issues.