Suspect members from saved state won't change state

Question

Suspect members from saved state won't change state

jeromegn opened this issue a year ago · comments

Jerome Gravel-Niquet commented a year ago

As an optimization, on startup we're applying the last known state of the cluster so it's a lot faster to know a whole cluster when there are hundreds of nodes.

I noticed that if the last saved state was Suspect, and it was applied on start, those members would never go back to a non-suspect state.

As an experiment I filtered out all Suspect members when using apply_many on startup and it appeared to fix it. The members were discovered as Alive again.

Is this the way to prevent this behaviour or is it a bug?

caio · Answer 1 · Mon Aug 28 2023 15:10:08 GMT+0800 (China Standard Time)

Keeping only Alive is cleaner.

When you load state from an external source the risk is that the state diverges and the updates get disseminated before there's a chance for the new node (with old state) to catch up.

So when you load a Suspect state there's a higher chance that this has transitioned to Down and you missed this update, so you end up thinking a Down node is alive until you eventually ping it (may be a while, given your cluster size and the fact that nobody else in the cluster thinks they are active)

It may also happen that while the Suspect state remains, the node that initiated it went down (or was restarted). So it won't end up declaring the node down. This is perfectly fine.

From a cluster membership perspective Suspect == Alive and users shouldn't be obsessing about the difference. What's important to know is whether the node is still actively participating (probing and being probed periodically, mostly). Foca could do a better job at exposing the precise cycle (I'm thinking new Notifications), but "are we sending and receiving data" is a good enough proxy.

Jerome Gravel-Niquet · Answer 2 · Mon Aug 28 2023 22:25:18 GMT+0800 (China Standard Time)

Thanks, that makes sense. I think I'll add a few things to the logic for starting from a saved state, like: don't use an update older than n seconds.