weaviate / weaviate-helm

Helm charts to deploy Weaviate to k8s

Home Page:https://weaviate.io/developers/weaviate/current/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

High availability during updates, configuration updates etc. are currently not given

withstu opened this issue · comments

As a user I want the database to respond to requests also during updates. This includes e.g. the definition of the update strategy of the deployment. Currently there is no option to define e.g. a rolling update strategy with minimum availability configs. It is very important for me to ensure the database availability also duning updates and/or configuration changes. Currently updates or configuration changes lead to smaller downtimes or write limitations, because one cluster replica gets unready, because it can't find the currently updated replica.

Details on my setup. I'm running weaviate in version 16.3.1 on Kubernetes with two replicas without sharding.

Hi @withstu how did you achieve the HA during the upgrade?

Well, currently you can't achieve HA during the update, because there is no update strategy defined in the helm deployment: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

Hi @withstu note that StatefulSets have a different upgrade Strategy than Deployments. You can find the upgradeStrategy for a StatefulSet here. Default one is the RollingUpgrade and it should be this one by default, this means that it will upgrade one pod at a time starting from the last pod. I do not see when we would want to have OnDelete strategy but I might make it configurable.

Let me know what do you think of the upgrade strategy.

Now regarding the HA, it is possible as of Weaviate version v1.17 to have data replicated on multiple pods, you can read about this here and here.

NOTE: Changing the replication factor on an already existing class is still experimental feature.

Now regarding pods crashing during upgrade, you could try to increase the liveness/readiness probe and/or terminationGracePeriod, in case it take too long for a pod to be upgraded. If you do not have data replicate Weaviate would want ALL the pods to be ready and if one is down (in your case due to upgrade) the other are waiting for it to be back up but there is a limit on how long they wait and if it does not come up the other pods might crash. So if you would increase the probe limits it should already work.

I am closing this Issue since this is not an Upgrade Strategy problem. Feel free to respond and ask question here and.or re-open it.