openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the non-disruptive upgrade

happycoderincloud opened this issue · comments

Are you reporting an issue with existing content?

I'm new to the mayastor. Just read the upgrade in the official document. The doc says the upgrade is non-disruptive.

Some questions for the awesome features

  1. What does non-disruptive upgrade mean? Does it mean an IO won't be interrupted during upgrade?
  2. If an application workload and its volume are on the same node. Will the workload and the volume be moved to another node during upgrade? Then, the IO should be interrupted? If yes, is it disruptive upgrade?
  3. Should volumes be HA for non-disruptive upgrade? The HA means multiple replicas, or the feature mentioned in https://mayastor.gitbook.io/introduction/advanced-operations/ha?

Thanks for the clarification in advance.

Are you proposing new content, or a change to the existing documentation layout or structure?
Please describe your proposal.

Hi @happycoderincloud,

  1. IO might be temporarily "stalled" but there will be no IO failures at any time
  2. No, the application is not moved and it doesn't need to be restarted.
  3. Tbh the name we've gone with for what is technically called failover is not really the best as it could be interpreted in different ways.. What we mean in that page by HA is actually is basically on-demand switchover: when application nvme initiator has connection issues with a target, we move the target to another node and let initiator connect to the new target, without ever failing IO back to application.
    For upgrades ideally volumes should contain >1 replicas. Consider what happens when we want to restart mayastor dataplane pod a, and we have a single volume replica living in pod a... well as we're restarting the pod replica will not be available for some time.. so our volume target may see IO errors which may be propagated to the application node.
    We've tried to work around this issue on v2.4 (IIRC) by moving the volume target the same node as the replica, so when the pod is being restarted, the target is also restarted and the application again, sees no IO errors.

@tiagolobocastro
Thank you. One more question, do you use nvme multipath in the switchover for upgrade? If so, can you elaborate more on the mechanism? Thanks.