Document the replace storage device workflow in tendrl

Question

Document the replace storage device workflow in tendrl

shtripat opened this issue 8 years ago · comments

If a storage device goes faulty (which is supposed to happen in real world), we need to clearly have workflows defined how a new device could be brought in the system and replaced for the faulty one

This needs to take care of movement of data new device coming up into picture
Then slowly phasing out the old faulty device
Finally bring down the faulty device and remove from the underlying cluster

This flow looks simpler but involves lot of technicalities in ceph and gluster and its a risky stuff and need to be done very carefully, so flow and steps involved should be well thought through and implemented.

Mrugesh Karnik · Answer 1 · Tue Sep 27 2016 18:41:14 GMT+0800 (China Standard Time)

@shtripat has this been handled in skyring?

Shubhendu · Answer 2 · Tue Sep 27 2016 20:43:49 GMT+0800 (China Standard Time)

@brainfunked No. Replace storage device flow was not implemented in skyring. But to track and to make sure we dont loose the importance we can keep this issue open till we implement.

Nishanth Thomas · Answer 3 · Tue Oct 04 2016 16:29:52 GMT+0800 (China Standard Time)

Replace nothing but expand the cluster and then removing(shrink cluster) the faulty node. Its not going to be a single operation.