Membership states are kept even when the identity has been renewed

Question

Membership states are kept even when the identity has been renewed

jeromegn opened this issue 5 months ago · comments

Jerome Gravel-Niquet commented 5 months ago

(At the risk of creating an issue where it's all the same thing I've been misunderstanding...)

We've noticed that restarting a node or a small subset of nodes will create an odd situation where they'll receive far less payloads.

I think this might be due to the fact that foca does not throw away old identities when they're renewed. We have a way to dump the result of iter_membership_states and I noticed this:

$ corrosion cluster membership-states | grep -A4 -B5 3813b34
}
{
  "id": {
    "addr": "[fc01:a7b:152::]:8787",
    "cluster_id": 1,
    "id": "3813b347-28e3-47cd-b759-77030e0965b1",
    "ts": 7337274699724960448
  },
  "incarnation": 0,
  "state": "Alive"
--
}
{
  "id": {
    "addr": "[fc01:a7b:152::]:8787",
    "cluster_id": 1,
    "id": "3813b347-28e3-47cd-b759-77030e0965b1",
    "ts": 7335860957095395824
  },
  "incarnation": 0,
  "state": "Down"

Our identities include a timestamp (ts) which we use internally to instead of a bump field (as previously discussed) to make sure we only keep the latest identity in Corrosion.

Our has_same_prefix implementation should be preventing duplicates:

impl Identity for Actor {
    fn has_same_prefix(&self, other: &Self) -> bool {
        // this happens if we're announcing ourselves to another node
        // we don't yet have any info about them, except their gossip addr
        if other.id.is_nil() || self.id.is_nil() {
            self.addr.eq(&other.addr)
        } else {
            self.id.eq(&other.id)
        }
    }

    fn renew(&self) -> Option<Self> {
        Some(Self {
            id: self.id,
            addr: self.addr,
            ts: NTP64::from(duration_since_epoch()).into(),
            cluster_id: self.cluster_id,
        })
    }
}

I suspect the behavior in foca is intended, keeping downed members for however long the remove_down_after is set to? Should it keep down members even when they're another live one that has the same prefix?

I'm not sure yet the effect it's having in our project. We're checking the timestamp on Up and Down and making sure we only remove a member if it's the current one we know about and we only replace/add a member if the new timestamp is higher than the previous. I'll have to start dumping the members too to figure that one out.

Jerome Gravel-Niquet · Answer 1 · Wed Feb 21 2024 02:09:35 GMT+0800 (China Standard Time)

We have no code that checks for the state of a foca member and yet when rejoining the cluster, everything appears to be resolving itself.

It's a bit of a head scratcher...

Perhaps it's related to manually calling apply_many when we detect a MemberUp that's from a previous identity? https://github.com/superfly/corrosion/blob/99ef8a64c9f401dd8ad4f1136b3641cbc1ae5740/crates/corro-agent/src/agent/handlers.rs#L279-L290

caio · Answer 2 · Thu Feb 22 2024 17:24:25 GMT+0800 (China Standard Time)

Hey there,

We've noticed that restarting a node or a small subset of nodes will create an odd situation where they'll receive far less payloads.

Can you elaborate a bit more? What kind of payload?

Let me try to explain what I think you're observing. Simple outline of the scenario just to make sure we're on the same page:

Large wan cluster, every node knows about every other node
When you restart a node you save (iter_membership_state) and restore (apply_many) its state

So, every member has a complete (eventually consistent) idea of the whole cluster but since identities may come and go all the time, it's a good idea to periodically prune old knowledge (remove_down_after)

When you start up a new instance, it has zero knowledge. You feed it via apply_many; And now all this knowledge is in its backlog to be broadcast to the cluster

For a sufficiently large cluster it's very possible that Down identities are occasionally being re-learned (node A remove_down_after for node B expires, but node C is still broadcasting about node B being down); The more identities come in (by means of restarting), the higher the chance of this happening

So, it's the same scenario as the "ever growing members" case, but with a lower frequency and way lower impact (you won't try to talk to a down member)

If foca had identity equality (instead of has_same_prefix) it could simply expose a mechanism for the users to decide which to pick in case of conflict and it would be super easy to deal with it. Alas, that's not the case 😅, so here's what I suggest:

Prune the old identities before saving it (or right before loading them). i.e. Group your members by address and only keep the ones with highest ts. This way your node won't risk reintroducing known stale identities

An alternative would be to teach foca to clear its backlog (or a way to apply knowledge without broadcasting it)

Perhaps it's related to manually calling apply_many when we detect a MemberUp that's from a previous identity? https://github.com/superfly/corrosion/blob/99ef8a64c9f401dd8ad4f1136b3641cbc1ae5740/crates/corro-agent/src/agent/handlers.rs#L279-L290

This is what taught the cluster to self-correct in the case of stale active identities being broadcast. I'd expect these to only be happening during restarts and to go away a few minutes after restarts are complete

caio · Answer 3 · Wed Mar 20 2024 16:37:19 GMT+0800 (China Standard Time)

Release v0.17.0 should completely eliminate every "too many members" problems (identities have become strict).

Closing this issue as it looks stale. Feel free to reopen