m3db large memory spikes when removing nodes from clusters
BertHartm opened this issue · comments
Bert Hartmann commented
We've seen twice now that removing a node from placement on m3db v1.3.0 causes all of the nodes in that isolation group to bootstrap (expected as the shards move), and then as that process is completing, the memory and go routines are rising rapidly on other nodes causing them to run out of memory and crash.
Generally our cluster is running about 80% capacity and using less than half the machine ram before we start the node removal. It's failed 2 out of 2 times since we've upgraded the cluster so it does seem reproducible.
M commented