tikv / raft-rs

Raft distributed consensus algorithm implemented in Rust.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How does learner become follower?

sargarass opened this issue · comments

From my tests there is no automatic transmission from learner state to follower in v0.6. How to make it right then? I suppose it could be done before ticking the leader's raft.

  1. Does Raft-rs track the learners status or should I do it by using ConfState after apply_conf_change? Etcd's ProgressTracker has field IsLearner bool.

  2. Is it enough for the leader to check ProgressTracker of the learners to ensure they match his commited_index (learner's progress.committed_index == leader's raft.raft_log.committed && learner's ProgressState != Snapshot)?
    Or maybe a learner's state == ProgressState::Replicate is sufficient?

  3. Should the leader propose new configuration with exactly one node transmitted from learner to follower at a time or one ConfChangeV2 with all the learner ids and transition = Implicit will do the thing?

  4. Is it possible that in the future this transition will be made by raft-rs?

commented
  1. Does Raft-rs track the learners status or should I do it by using ConfState after apply_conf_change?

Raft-rs tracks the status in configuration. It's complicated whether a peer is learner or not when considering joint state. I suggest application also keep tracking the confstate.

  1. Is it enough for the leader to check ProgressTracker of the learners to ensure they match his commited_index

It depends on what do you want. If you just want to see if it's safe to promote a learner to voter, you can check the implementation in TiKV, I think it's the most suitable way without depending on much details of raft-rs.

  1. Should the leader propose new configuration with exactly one node transmitted from learner to follower at a time or one ConfChangeV2 with all the learner ids and transition = Implicit will do the thing?

Either way is OK. It depends on how much you want to control the process. TiKV choose to use ConfChangeV2 to promote a learner and demote a voter at the same time with transition set to explicit. You may want to check out https://github.com/tikv/rfcs/blob/master/text/0054-joint-consensus.md to see how TiKV adapts joint consensus.

  1. Is it possible that in the future this transition will be made by raft-rs?

I'm afraid no. Learner is a standalone role that can performs without becoming a voter. So raft-rs will not promote it automatically. For example, TiDB uses voters and learners at the same time to achieve isolated HTAP with strong consistency.

@BusyJay, thanks for the reply!

If you just want to see if it's safe to promote a learner to voter, you can check the implementation in TiKV

1. let promoted_commit_index = after_progress.maximal_committed_index().0;
2. if current_progress.is_singleton() // It's always safe if there is only one node in the cluster.
3.    || promoted_commit_index >= self.get_store().truncated_index()
4. {
5.    return Ok(());
6. }

So for safety:
0. Does it happen before applying new configuration or before proposing it?

  1. Does it check whenever new quorum would have maximal_committed_index >= current leader's commit_index in line 3?
  2. Why is promoted_commit_index >= current_progress.maximal_committed_index().0 not used instead of line 3?
  3. What is maximal_committed_index.1 bool used for? Is it safe to ignore it?
  4. Why is the maximum used, not just the quorum's commited_index?
  5. Line 2 is not obvious. Let's assume that there are several learners way behind our 1 node-cluster. Would not it cause data-loss/other problems if they are promoted to voters and then the leader immediately fails?
commented

If the commit index become smaller than the leader's truncated index after applying the configuration change, then leader will have to send snapshot to at lease one node to make quorum catch up enough logs. Snapshot is slow and it will pause the whole group.

Checking leader's commit index is a stricter constraint, which may not be possible in all conditions. For example, a fast voter may never be removed with such requirement.

maximal_committed_index.1 bool used for? Is it safe to ignore it?

It's for group commit, which is an extension, you can safely ignore it generally.

Let's assume that there are several learners way behind our 1 node-cluster. Would not it cause data-loss/other problems if they are promoted to voters and then the leader immediately fails?

What if leader does nothing and fails? It's not the problem that multiple nodes are being promoted but the fact that there is only one healthy node in the group. We consider singleton is dangerous, we want to add multiple replicas as soon as possible. The comment is not accurate though.

I think I have almost everything needed.
What is leader's truncated index tho, how do I get it?

Edit: Is it the index of latest entry in the latest snapshot? (Therefore, we do not need to send a snapshot after this condition is met)

commented

It's the minimal index of available logs minus one.

@BusyJay, appreciate for help.