Potential serious-bug in raft implementation (or) in the etcd contrib example.
harinath001 opened this issue · comments
Context
- I am referring to the example mentioned here https://github.com/etcd-io/etcd/blob/main/contrib/raftexample/raft.go
- I add few custom logs for understanding and disabled the transport level logs which spams my terminal.
- see the comment for the patch which contains the changes i made for testing #174 (comment)
- also referred the paper https://raft.github.io/raft.pdf
- and leader has to send a new snapshot to the slow followers if compacted logs dont contain the entries which the follower is requesting to append the entries.
- able to reproduce this bug multiple times.
Bug pattern
in a 4 node cluster, 4th node cannot join the cluster after some time (during which there will be leadership changes and log compactions happened.)
Changes made in the example code to take frequent snapshots
( see the comment for the patch which contains the changes i made for testing #174 (comment))
made defaultSnapshotCount = 1
made snapshotCatchUpEntriesN = 1
Detailed steps for reproducing the bug.
Compile steps
go build -o raftexample
Create a cluster of 3 raft servers
./raftexample --id 1 --cluster http://127.0.0.1:8001,http://127.0.0.1:8002,http://127.0.0.1:8003 --port 9001
2024/03/03 14:21:26 replaying WAL of member 1
2024/03/03 14:21:26 loading WAL at term 0 and index 0
raft2024/03/03 14:21:26 INFO: 1 switched to configuration voters=()
raft2024/03/03 14:21:26 INFO: 1 became follower at term 0
raft2024/03/03 14:21:26 INFO: newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/03 14:21:26 INFO: 1 became follower at term 1
raft2024/03/03 14:21:26 INFO: 1 switched to configuration voters=(1)
raft2024/03/03 14:21:26 INFO: 1 switched to configuration voters=(1 2)
raft2024/03/03 14:21:26 INFO: 1 switched to configuration voters=(1 2 3)
2024/03/03 14:21:26 ----------------------------------------------------------
2024/03/03 14:21:26 {"ID":1,"term":1,"vote":0,"commit":3,"Lead":0,"RaftState":"StateFollower","Applied":0,"LeadTransferee":0}
2024/03/03 14:21:26 rc.raftStorage.FirstIndex() is 1 the rc.raftStorage.LastIndex() 0
raft2024/03/03 14:21:26 INFO: 1 switched to configuration voters=(1 2 3)
raft2024/03/03 14:21:26 INFO: 1 switched to configuration voters=(1 2 3)
raft2024/03/03 14:21:26 INFO: 1 switched to configuration voters=(1 2 3)
2024/03/03 14:21:26 ----> maybe triggering snapshot..........
2024/03/03 14:21:26 start snapshot [applied index: 3 | last snapshot index: 0]
2024/03/03 14:21:26 compacted log at index 2
raft2024/03/03 14:21:28 INFO: 1 is starting a new election at term 1
raft2024/03/03 14:21:28 INFO: 1 became candidate at term 2
raft2024/03/03 14:21:28 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 2 at term 2
raft2024/03/03 14:21:28 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 3 at term 2
raft2024/03/03 14:21:28 INFO: 1 received MsgVoteResp from 1 at term 2
raft2024/03/03 14:21:28 INFO: 1 has received 1 MsgVoteResp votes and 0 vote rejections
2024/03/03 14:21:28 ----------------------------------------------------------
2024/03/03 14:21:28 {"ID":1,"term":2,"vote":1,"commit":3,"Lead":0,"RaftState":"StateCandidate","Applied":3,"LeadTransferee":0}
2024/03/03 14:21:28 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 3
raft2024/03/03 14:21:29 INFO: 1 is starting a new election at term 2
raft2024/03/03 14:21:29 INFO: 1 became candidate at term 3
raft2024/03/03 14:21:29 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 2 at term 3
raft2024/03/03 14:21:29 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 3 at term 3
raft2024/03/03 14:21:29 INFO: 1 received MsgVoteResp from 1 at term 3
raft2024/03/03 14:21:29 INFO: 1 has received 1 MsgVoteResp votes and 0 vote rejections
2024/03/03 14:21:30 ----------------------------------------------------------
2024/03/03 14:21:30 {"ID":1,"term":3,"vote":1,"commit":3,"Lead":0,"RaftState":"StateCandidate","Applied":3,"LeadTransferee":0}
2024/03/03 14:21:30 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 3
raft2024/03/03 14:21:31 INFO: 1 is starting a new election at term 3
raft2024/03/03 14:21:31 INFO: 1 became candidate at term 4
raft2024/03/03 14:21:31 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 2 at term 4
raft2024/03/03 14:21:31 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 3 at term 4
raft2024/03/03 14:21:31 INFO: 1 received MsgVoteResp from 1 at term 4
raft2024/03/03 14:21:31 INFO: 1 has received 1 MsgVoteResp votes and 0 vote rejections
raft2024/03/03 14:21:32 INFO: 1 is starting a new election at term 4
raft2024/03/03 14:21:32 INFO: 1 became candidate at term 5
raft2024/03/03 14:21:32 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 2 at term 5
raft2024/03/03 14:21:32 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 3 at term 5
raft2024/03/03 14:21:32 INFO: 1 received MsgVoteResp from 1 at term 5
raft2024/03/03 14:21:32 INFO: 1 has received 1 MsgVoteResp votes and 0 vote rejections
2024/03/03 14:21:32 ----------------------------------------------------------
2024/03/03 14:21:32 {"ID":1,"term":5,"vote":1,"commit":3,"Lead":0,"RaftState":"StateCandidate","Applied":3,"LeadTransferee":0}
2024/03/03 14:21:32 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 3
raft2024/03/03 14:21:33 INFO: 1 is starting a new election at term 5
raft2024/03/03 14:21:33 INFO: 1 became candidate at term 6
raft2024/03/03 14:21:33 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 2 at term 6
raft2024/03/03 14:21:33 INFO: 1 [logterm: 1, index: 3] sent MsgVote request to 3 at term 6
raft2024/03/03 14:21:33 INFO: 1 received MsgVoteResp from 1 at term 6
raft2024/03/03 14:21:33 INFO: 1 has received 1 MsgVoteResp votes and 0 vote rejections
raft2024/03/03 14:21:33 INFO: 1 received MsgVoteResp from 2 at term 6
raft2024/03/03 14:21:33 INFO: 1 has received 2 MsgVoteResp votes and 0 vote rejections
raft2024/03/03 14:21:33 INFO: 1 became leader at term 6
raft2024/03/03 14:21:33 INFO: raft.node: 1 elected leader 1 at term 6
2024/03/03 14:21:34 ----------------------------------------------------------
2024/03/03 14:21:34 {"ID":1,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateLeader","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:34 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
leader's view of node-id: 1 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 2 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 3 is StateProbe recent snapshot is 0 recent active is false
./raftexample --id 2 --cluster http://127.0.0.1:8001,http://127.0.0.1:8002,http://127.0.0.1:8003 --port 9002
2024/03/03 14:21:33 replaying WAL of member 2
2024/03/03 14:21:33 loading WAL at term 0 and index 0
raft2024/03/03 14:21:33 INFO: 2 switched to configuration voters=()
raft2024/03/03 14:21:33 INFO: 2 became follower at term 0
raft2024/03/03 14:21:33 INFO: newRaft 2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/03 14:21:33 INFO: 2 became follower at term 1
raft2024/03/03 14:21:33 INFO: 2 switched to configuration voters=(1)
raft2024/03/03 14:21:33 INFO: 2 switched to configuration voters=(1 2)
raft2024/03/03 14:21:33 INFO: 2 switched to configuration voters=(1 2 3)
2024/03/03 14:21:33 ----------------------------------------------------------
2024/03/03 14:21:33 {"ID":2,"term":1,"vote":0,"commit":3,"Lead":0,"RaftState":"StateFollower","Applied":0,"LeadTransferee":0}
2024/03/03 14:21:33 rc.raftStorage.FirstIndex() is 1 the rc.raftStorage.LastIndex() 0
raft2024/03/03 14:21:33 INFO: 2 switched to configuration voters=(1 2 3)
raft2024/03/03 14:21:33 INFO: 2 switched to configuration voters=(1 2 3)
raft2024/03/03 14:21:33 INFO: 2 switched to configuration voters=(1 2 3)
2024/03/03 14:21:33 ----> maybe triggering snapshot..........
2024/03/03 14:21:33 start snapshot [applied index: 3 | last snapshot index: 0]
2024/03/03 14:21:33 compacted log at index 2
raft2024/03/03 14:21:33 INFO: 2 [term: 1] received a MsgVote message with higher term from 1 [term: 6]
raft2024/03/03 14:21:33 INFO: 2 became follower at term 6
raft2024/03/03 14:21:33 INFO: 2 [logterm: 1, index: 3, vote: 0] cast MsgVote for 1 [logterm: 1, index: 3] at term 6
raft2024/03/03 14:21:33 INFO: raft.node: 2 elected leader 1 at term 6
2024/03/03 14:21:35 ----------------------------------------------------------
2024/03/03 14:21:35 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:35 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:37 ----------------------------------------------------------
2024/03/03 14:21:37 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:37 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:39 ----------------------------------------------------------
2024/03/03 14:21:39 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:39 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:41 ----------------------------------------------------------
2024/03/03 14:21:41 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:41 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:43 ----------------------------------------------------------
2024/03/03 14:21:43 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:43 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:45 ----------------------------------------------------------
2024/03/03 14:21:45 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:45 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:47 ----------------------------------------------------------
2024/03/03 14:21:47 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:47 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:49 ----------------------------------------------------------
2024/03/03 14:21:49 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:49 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:51 ----------------------------------------------------------
2024/03/03 14:21:51 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:51 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:53 ----------------------------------------------------------
2024/03/03 14:21:53 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:53 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:55 ----------------------------------------------------------
2024/03/03 14:21:55 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:55 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:57 ----------------------------------------------------------
2024/03/03 14:21:57 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:57 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:59 ----------------------------------------------------------
2024/03/03 14:21:59 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:59 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:22:01 ----------------------------------------------------------
2024/03/03 14:22:01 {"ID":2,"term":6,"vote":1,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:22:01 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:22:03 ----------------------------------------------------------
./raftexample --id 3 --cluster http://127.0.0.1:8001,http://127.0.0.1:8002,http://127.0.0.1:8003 --port 9003
2024/03/03 14:21:37 replaying WAL of member 3
2024/03/03 14:21:37 loading WAL at term 0 and index 0
raft2024/03/03 14:21:37 INFO: 3 switched to configuration voters=()
raft2024/03/03 14:21:37 INFO: 3 became follower at term 0
raft2024/03/03 14:21:37 INFO: newRaft 3 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/03 14:21:37 INFO: 3 became follower at term 1
raft2024/03/03 14:21:37 INFO: 3 switched to configuration voters=(1)
raft2024/03/03 14:21:37 INFO: 3 switched to configuration voters=(1 2)
raft2024/03/03 14:21:37 INFO: 3 switched to configuration voters=(1 2 3)
2024/03/03 14:21:37 ----------------------------------------------------------
2024/03/03 14:21:37 {"ID":3,"term":1,"vote":0,"commit":3,"Lead":0,"RaftState":"StateFollower","Applied":0,"LeadTransferee":0}
2024/03/03 14:21:37 rc.raftStorage.FirstIndex() is 1 the rc.raftStorage.LastIndex() 0
raft2024/03/03 14:21:37 INFO: 3 switched to configuration voters=(1 2 3)
raft2024/03/03 14:21:37 INFO: 3 switched to configuration voters=(1 2 3)
raft2024/03/03 14:21:37 INFO: 3 switched to configuration voters=(1 2 3)
2024/03/03 14:21:37 ----> maybe triggering snapshot..........
2024/03/03 14:21:37 start snapshot [applied index: 3 | last snapshot index: 0]
2024/03/03 14:21:37 compacted log at index 2
raft2024/03/03 14:21:37 INFO: 3 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 6]
raft2024/03/03 14:21:37 INFO: 3 became follower at term 6
raft2024/03/03 14:21:37 INFO: raft.node: 3 elected leader 1 at term 6
2024/03/03 14:21:39 ----------------------------------------------------------
2024/03/03 14:21:39 {"ID":3,"term":6,"vote":0,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:39 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:41 ----------------------------------------------------------
2024/03/03 14:21:41 {"ID":3,"term":6,"vote":0,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:41 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:43 ----------------------------------------------------------
2024/03/03 14:21:43 {"ID":3,"term":6,"vote":0,"commit":4,"Lead":1,"RaftState":"StateFollower","Applied":4,"LeadTransferee":0}
2024/03/03 14:21:43 rc.raftStorage.FirstIndex() is 3 the rc.raftStorage.LastIndex() 4
2024/03/03 14:21:45 ----------------------------------------------------------
Add 4th server
curl -L http://127.0.0.1:9003/4 -XPOST -d http://127.0.0.1:8004
Starting 4th server
./raftexample --id 4 --cluster http://127.0.0.1:8001,http://127.0.0.1:8002,http://127.0.0.1:8003,http://127.0.0.1:8004 --port 9004 --join
2024/03/03 14:25:01 replaying WAL of member 4
2024/03/03 14:25:01 loading WAL at term 0 and index 0
raft2024/03/03 14:25:01 INFO: 4 switched to configuration voters=()
raft2024/03/03 14:25:01 INFO: 4 became follower at term 0
raft2024/03/03 14:25:01 INFO: newRaft 4 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2024/03/03 14:25:01 ----------------------------------------------------------
2024/03/03 14:25:01 {"ID":4,"term":0,"vote":0,"commit":0,"Lead":0,"RaftState":"StateFollower","Applied":0,"LeadTransferee":0}
2024/03/03 14:25:01 rc.raftStorage.FirstIndex() is 1 the rc.raftStorage.LastIndex() 0
raft2024/03/03 14:25:01 INFO: 4 [term: 0] received a MsgHeartbeat message with higher term from 1 [term: 6]
raft2024/03/03 14:25:01 INFO: 4 became follower at term 6
raft2024/03/03 14:25:01 INFO: raft.node: 4 elected leader 1 at term 6
raft2024/03/03 14:25:01 INFO: log [committed=0, applied=0, applying=0, unstable.offset=1, unstable.offsetInProgress=1, len(unstable.Entries)=0] starts to restore snapshot [index: 5, term: 6]
raft2024/03/03 14:25:01 INFO: 4 switched to configuration voters=(1 2 3 4)
raft2024/03/03 14:25:01 INFO: 4 [commit: 5, lastindex: 5, lastterm: 6] restored snapshot [index: 5, term: 6]
raft2024/03/03 14:25:01 INFO: 4 [commit: 5] restored snapshot [index: 5, term: 6]
SSSSSSSSSSSSSSSSSSSSSSSS> received SNAPSHOT term is 6 index 5
2024/03/03 14:25:01 publishing snapshot at index 0
2024/03/03 14:25:01 finished publishing snapshot at index 0
2024/03/03 14:25:01 loading snapshot at term 6 and index 5
2024/03/03 14:25:03 ----------------------------------------------------------
2024/03/03 14:25:03 {"ID":4,"term":6,"vote":0,"commit":5,"Lead":1,"RaftState":"StateFollower","Applied":5,"LeadTransferee":0}
2024/03/03 14:25:03 rc.raftStorage.FirstIndex() is 6 the rc.raftStorage.LastIndex() 5
2024/03/03 14:25:05 ----------------------------------------------------------
2024/03/03 14:25:05 {"ID":4,"term":6,"vote":0,"commit":5,"Lead":1,"RaftState":"StateFollower","Applied":5,"LeadTransferee":0}
2024/03/03 14:25:05 rc.raftStorage.FirstIndex() is 6 the rc.raftStorage.LastIndex() 5
2024/03/03 14:25:07 ----------------------------------------------------------
make some transactions which will increase the applied index
curl -L http://127.0.0.1:9001/my-key -XPUT -d bar1
curl -L http://127.0.0.1:9001/my-key -XPUT -d bar1
curl -L http://127.0.0.1:9001/my-key -XPUT -d bar1
curl -L http://127.0.0.1:9001/my-key -XPUT -d bar1
logs from server-1, which is the leader now.
2024/03/03 14:26:11 ----------------------------------------------------------
2024/03/03 14:26:11 {"ID":1,"term":6,"vote":1,"commit":12,"Lead":1,"RaftState":"StateLeader","Applied":12,"LeadTransferee":0}
2024/03/03 14:26:11 rc.raftStorage.FirstIndex() is 11 the rc.raftStorage.LastIndex() 12
leader's view of node-id: 1 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 2 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 3 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 4 is StateReplicate recent snapshot is 0 recent active is true
2024/03/03 14:26:13 ----------------------------------------------------------
2024/03/03 14:26:13 {"ID":1,"term":6,"vote":1,"commit":12,"Lead":1,"RaftState":"StateLeader","Applied":12,"LeadTransferee":0}
2024/03/03 14:26:13 rc.raftStorage.FirstIndex() is 11 the rc.raftStorage.LastIndex() 12
leader's view of node-id: 4 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 1 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 2 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 3 is StateReplicate recent snapshot is 0 recent active is true
all the nodes applies the commits till index 12.
stop the 4th server
<killed the 4th server>
2024/03/03 14:27:37 ----------------------------------------------------------
2024/03/03 14:27:37 {"ID":4,"term":6,"vote":0,"commit":12,"Lead":1,"RaftState":"StateFollower","Applied":12,"LeadTransferee":0}
2024/03/03 14:27:37 rc.raftStorage.FirstIndex() is 11 the rc.raftStorage.LastIndex() 12
^C
harinath@harinaths-MacBook-Pro raftexample %
trigger multiple leader failures, which leads to leadership changes
current leader: server 1
2024/03/03 14:28:57 ----------------------------------------------------------
2024/03/03 14:28:57 {"ID":1,"term":6,"vote":1,"commit":12,"Lead":1,"RaftState":"StateLeader","Applied":12,"LeadTransferee":0}
2024/03/03 14:28:57 rc.raftStorage.FirstIndex() is 11 the rc.raftStorage.LastIndex() 12
leader's view of node-id: 1 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 2 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 3 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 4 is StateProbe recent snapshot is 0 recent active is false
^C
harinath@harinaths-MacBook-Pro raftexample %
and restart server 1
./raftexample --id 1 --cluster http://127.0.0.1:8001,http://127.0.0.1:8002,http://127.0.0.1:8003 --port 9001
2024/03/03 14:29:24 replaying WAL of member 1
2024/03/03 14:29:24 loading WAL at term 6 and index 11
raft2024/03/03 14:29:24 INFO: 1 switched to configuration voters=(1 2 3 4)
raft2024/03/03 14:29:24 INFO: 1 became follower at term 6
raft2024/03/03 14:29:24 INFO: newRaft 1 [peers: [1,2,3,4], term: 6, commit: 11, applied: 11, lastindex: 12, lastterm: 6]
2024/03/03 14:29:24 loading snapshot at term 6 and index 11
2024/03/03 14:29:24 ----------------------------------------------------------
2024/03/03 14:29:24 {"ID":1,"term":6,"vote":1,"commit":11,"Lead":0,"RaftState":"StateFollower","Applied":11,"LeadTransferee":0}
2024/03/03 14:29:24 rc.raftStorage.FirstIndex() is 12 the rc.raftStorage.LastIndex() 12
raft2024/03/03 14:29:24 INFO: 1 [term: 6] received a MsgVote message with higher term from 2 [term: 27]
raft2024/03/03 14:29:24 INFO: 1 became follower at term 27
raft2024/03/03 14:29:24 INFO: 1 [logterm: 6, index: 12, vote: 0] cast MsgVote for 2 [logterm: 6, index: 12] at term 27
raft2024/03/03 14:29:24 INFO: raft.node: 1 elected leader 2 at term 27
2024/03/03 14:29:24 ----> maybe triggering snapshot..........
2024/03/03 14:29:24 start snapshot [applied index: 13 | last snapshot index: 11]
2024/03/03 14:29:24 compacted log at index 12
2024/03/03 14:29:26 ----------------------------------------------------------
2024/03/03 14:29:26 {"ID":1,"term":27,"vote":2,"commit":13,"Lead":2,"RaftState":"StateFollower","Applied":13,"LeadTransferee":0}
next leader : server 2
logs in server 2
2024/03/03 14:31:33 ----------------------------------------------------------
2024/03/03 14:31:33 {"ID":2,"term":27,"vote":2,"commit":13,"Lead":2,"RaftState":"StateLeader","Applied":13,"LeadTransferee":0}
2024/03/03 14:31:33 rc.raftStorage.FirstIndex() is 13 the rc.raftStorage.LastIndex() 13
leader's view of node-id: 1 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 2 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 3 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 4 is StateProbe recent snapshot is 0 recent active is false
2024/03/03 14:31:35 ----------------------------------------------------------
2024/03/03 14:33:57 ----------------------------------------------------------
2024/03/03 14:33:57 {"ID":2,"term":27,"vote":2,"commit":13,"Lead":2,"RaftState":"StateLeader","Applied":13,"LeadTransferee":0}
2024/03/03 14:33:57 rc.raftStorage.FirstIndex() is 13 the rc.raftStorage.LastIndex() 13
leader's view of node-id: 1 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 2 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 3 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 4 is StateProbe recent snapshot is 0 recent active is false
^C
harinath@harinaths-MacBook-Pro raftexample %
start server 2 back
2024/03/03 14:34:25 replaying WAL of member 2
2024/03/03 14:34:25 loading WAL at term 27 and index 13
raft2024/03/03 14:34:25 INFO: 2 switched to configuration voters=(1 2 3 4)
raft2024/03/03 14:34:25 INFO: 2 became follower at term 27
raft2024/03/03 14:34:25 INFO: newRaft 2 [peers: [1,2,3,4], term: 27, commit: 13, applied: 13, lastindex: 13, lastterm: 27]
2024/03/03 14:34:25 loading snapshot at term 27 and index 13
2024/03/03 14:34:25 ----------------------------------------------------------
2024/03/03 14:34:25 {"ID":2,"term":27,"vote":2,"commit":13,"Lead":0,"RaftState":"StateFollower","Applied":13,"LeadTransferee":0}
2024/03/03 14:34:25 rc.raftStorage.FirstIndex() is 14 the rc.raftStorage.LastIndex() 13
raft2024/03/03 14:34:26 INFO: 2 [term: 27] received a MsgVote message with higher term from 1 [term: 49]
raft2024/03/03 14:34:26 INFO: 2 became follower at term 49
raft2024/03/03 14:34:26 INFO: 2 [logterm: 27, index: 13, vote: 0] cast MsgVote for 1 [logterm: 27, index: 13] at term 49
raft2024/03/03 14:34:26 INFO: raft.node: 2 elected leader 1 at term 49
2024/03/03 14:34:27 ----------------------------------------------------------
2024/03/03 14:34:27 {"ID":2,"term":49,"vote":1,"commit":14,"Lead":1,"RaftState":"StateFollower","Applied":14,"LeadTransferee":0}
2024/03/03 14:34:27 rc.raftStorage.FirstIndex() is 14 the rc.raftStorage.LastIndex() 14
2024/03/03 14:34:29 ----------------------------------------------------------
2024/03/03 14:34:29 {"ID":2,"term":49,"vote":1,"commit":14,"Lead":1,"RaftState":"StateFollower","Applied":14,"LeadTransferee":0}
2024/03/03 14:34:29 rc.raftStorage.FirstIndex() is 14 the rc.raftStorage.LastIndex() 14
next leader: server 1
and restart server 1
2024/03/03 14:35:40 replaying WAL of member 1
2024/03/03 14:35:40 loading WAL at term 27 and index 13
raft2024/03/03 14:35:40 INFO: 1 switched to configuration voters=(1 2 3 4)
raft2024/03/03 14:35:40 INFO: 1 became follower at term 49
raft2024/03/03 14:35:40 INFO: newRaft 1 [peers: [1,2,3,4], term: 49, commit: 13, applied: 13, lastindex: 14, lastterm: 49]
2024/03/03 14:35:40 loading snapshot at term 27 and index 13
2024/03/03 14:35:40 ----------------------------------------------------------
2024/03/03 14:35:40 {"ID":1,"term":49,"vote":1,"commit":13,"Lead":0,"RaftState":"StateFollower","Applied":13,"LeadTransferee":0}
2024/03/03 14:35:40 rc.raftStorage.FirstIndex() is 14 the rc.raftStorage.LastIndex() 14
raft2024/03/03 14:35:41 INFO: 1 [term: 49] received a MsgVote message with higher term from 3 [term: 51]
raft2024/03/03 14:35:41 INFO: 1 became follower at term 51
raft2024/03/03 14:35:41 INFO: 1 [logterm: 49, index: 14, vote: 0] cast MsgVote for 3 [logterm: 49, index: 14] at term 51
raft2024/03/03 14:35:41 INFO: raft.node: 1 elected leader 3 at term 51
2024/03/03 14:35:41 ----> maybe triggering snapshot..........
2024/03/03 14:35:41 start snapshot [applied index: 15 | last snapshot index: 13]
2024/03/03 14:35:41 compacted log at index 14
2024/03/03 14:35:42 ----------------------------------------------------------
2024/03/03 14:35:42 {"ID":1,"term":51,"vote":3,"commit":15,"Lead":3,"RaftState":"StateFollower","Applied":15,"LeadTransferee":0}
next leader: server 3
2024/03/03 14:37:09 ----------------------------------------------------------
2024/03/03 14:37:09 {"ID":3,"term":51,"vote":3,"commit":15,"Lead":3,"RaftState":"StateLeader","Applied":15,"LeadTransferee":0}
2024/03/03 14:37:09 rc.raftStorage.FirstIndex() is 15 the rc.raftStorage.LastIndex() 15
leader's view of node-id: 1 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 2 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 3 is StateReplicate recent snapshot is 0 recent active is true
leader's view of node-id: 4 is StateProbe recent snapshot is 0 recent active is false
2024/03/03 14:37:11 ----------------------------------------------------------
start the 4th server back
leader sends the snapshot now
2024/03/03 14:37:38 replaying WAL of member 4
2024/03/03 14:37:38 loading WAL at term 6 and index 11
raft2024/03/03 14:37:38 INFO: 4 switched to configuration voters=(1 2 3 4)
2024/03/03 14:37:38 loading snapshot at term 6 and index 11
raft2024/03/03 14:37:38 INFO: 4 became follower at term 6
raft2024/03/03 14:37:38 INFO: newRaft 4 [peers: [1,2,3,4], term: 6, commit: 11, applied: 11, lastindex: 12, lastterm: 6]
2024/03/03 14:37:38 ----------------------------------------------------------
2024/03/03 14:37:38 {"ID":4,"term":6,"vote":0,"commit":11,"Lead":0,"RaftState":"StateFollower","Applied":11,"LeadTransferee":0}
2024/03/03 14:37:38 rc.raftStorage.FirstIndex() is 12 the rc.raftStorage.LastIndex() 12
raft2024/03/03 14:37:38 INFO: 4 [term: 6] received a MsgHeartbeat message with higher term from 3 [term: 51]
raft2024/03/03 14:37:38 INFO: 4 became follower at term 51
raft2024/03/03 14:37:38 INFO: raft.node: 4 elected leader 3 at term 51
raft2024/03/03 14:37:38 INFO: log [committed=11, applied=11, applying=11, unstable.offset=13, unstable.offsetInProgress=13, len(unstable.Entries)=0] starts to restore snapshot [index: 15, term: 51]
raft2024/03/03 14:37:38 INFO: 4 switched to configuration voters=(1 2 3 4)
raft2024/03/03 14:37:38 INFO: 4 [commit: 15, lastindex: 15, lastterm: 51] restored snapshot [index: 15, term: 51]
raft2024/03/03 14:37:38 INFO: 4 [commit: 15] restored snapshot [index: 15, term: 51]
SSSSSSSSSSSSSSSSSSSSSSSS> received SNAPSHOT term is 51 index 15
2024/03/03 14:37:38 publishing snapshot at index 11
2024/03/03 14:37:38 finished publishing snapshot at index 11
2024/03/03 14:37:38 loading snapshot at term 51 and index 15
2024/03/03 14:37:40 ----------------------------------------------------------
2024/03/03 14:37:40 {"ID":4,"term":51,"vote":0,"commit":15,"Lead":3,"RaftState":"StateFollower","Applied":15,"LeadTransferee":0}
2024/03/03 14:37:40 rc.raftStorage.FirstIndex() is 16 the rc.raftStorage.LastIndex() 15
2024/03/03 14:37:42 ----------------------------------------------------------
2024/03/03 14:37:42 {"ID":4,"term":51,"vote":0,"commit":15,"Lead":3,"RaftState":"StateFollower","Applied":15,"LeadTransferee":0}
2024/03/03 14:37:42 rc.raftStorage.FirstIndex() is 16 the rc.raftStorage.LastIndex() 15
kill server 4 again
now do multiple transactions to increment the applied index on the active nodes.
kill leader (server 3)
and restart server 3
2024/03/03 14:39:57 replaying WAL of member 3
2024/03/03 14:39:57 loading WAL at term 51 and index 43
raft2024/03/03 14:39:57 INFO: 3 switched to configuration voters=(1 2 3 4)
2024/03/03 14:39:57 loading snapshot at term 51 and index 43
raft2024/03/03 14:39:57 INFO: 3 became follower at term 51
raft2024/03/03 14:39:57 INFO: newRaft 3 [peers: [1,2,3,4], term: 51, commit: 43, applied: 43, lastindex: 43, lastterm: 51]
2024/03/03 14:39:57 ----------------------------------------------------------
2024/03/03 14:39:57 {"ID":3,"term":51,"vote":3,"commit":43,"Lead":0,"RaftState":"StateFollower","Applied":43,"LeadTransferee":0}
2024/03/03 14:39:57 rc.raftStorage.FirstIndex() is 44 the rc.raftStorage.LastIndex() 43
raft2024/03/03 14:39:58 INFO: 3 [term: 51] received a MsgVote message with higher term from 1 [term: 72]
raft2024/03/03 14:39:58 INFO: 3 became follower at term 72
raft2024/03/03 14:39:58 INFO: 3 [logterm: 51, index: 43, vote: 0] cast MsgVote for 1 [logterm: 51, index: 43] at term 72
raft2024/03/03 14:39:58 INFO: raft.node: 3 elected leader 1 at term 72
2024/03/03 14:39:59 ----------------------------------------------------------
2024/03/03 14:39:59 {"ID":3,"term":72,"vote":1,"commit":44,"Lead":1,"RaftState":"StateFollower","Applied":44,"LeadTransferee":0}
2024/03/03 14:39:59 rc.raftStorage.FirstIndex() is 44 the rc.raftStorage.LastIndex() 44
now leader is server-1
now restart 4 expecting a snapshot from server-1
2024/03/03 14:41:26 replaying WAL of member 4
2024/03/03 14:41:26 loading WAL at term 6 and index 11
raft2024/03/03 14:41:26 INFO: 4 switched to configuration voters=(1 2 3 4)
raft2024/03/03 14:41:26 INFO: 4 became follower at term 51
raft2024/03/03 14:41:26 INFO: newRaft 4 [peers: [1,2,3,4], term: 51, commit: 11, applied: 11, lastindex: 12, lastterm: 6]
2024/03/03 14:41:26 loading snapshot at term 51 and index 15
2024/03/03 14:41:26 ----------------------------------------------------------
2024/03/03 14:41:26 {"ID":4,"term":51,"vote":0,"commit":11,"Lead":0,"RaftState":"StateFollower","Applied":11,"LeadTransferee":0}
2024/03/03 14:41:26 rc.raftStorage.FirstIndex() is 12 the rc.raftStorage.LastIndex() 12
raft2024/03/03 14:41:27 INFO: 4 is starting a new election at term 51
raft2024/03/03 14:41:27 INFO: 4 became candidate at term 52
raft2024/03/03 14:41:27 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 1 at term 52
raft2024/03/03 14:41:27 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 2 at term 52
raft2024/03/03 14:41:27 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 3 at term 52
raft2024/03/03 14:41:27 INFO: 4 received MsgVoteResp from 4 at term 52
raft2024/03/03 14:41:27 INFO: 4 has received 1 MsgVoteResp votes and 0 vote rejections
2024/03/03 14:41:28 ----------------------------------------------------------
2024/03/03 14:41:28 {"ID":4,"term":52,"vote":4,"commit":11,"Lead":0,"RaftState":"StateCandidate","Applied":11,"LeadTransferee":0}
2024/03/03 14:41:28 rc.raftStorage.FirstIndex() is 12 the rc.raftStorage.LastIndex() 12
raft2024/03/03 14:41:28 INFO: 4 is starting a new election at term 52
raft2024/03/03 14:41:28 INFO: 4 became candidate at term 53
raft2024/03/03 14:41:28 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 1 at term 53
raft2024/03/03 14:41:28 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 2 at term 53
raft2024/03/03 14:41:28 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 3 at term 53
raft2024/03/03 14:41:28 INFO: 4 received MsgVoteResp from 4 at term 53
raft2024/03/03 14:41:28 INFO: 4 has received 1 MsgVoteResp votes and 0 vote rejections
raft2024/03/03 14:41:29 INFO: 4 is starting a new election at term 53
raft2024/03/03 14:41:29 INFO: 4 became candidate at term 54
raft2024/03/03 14:41:29 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 1 at term 54
raft2024/03/03 14:41:29 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 2 at term 54
raft2024/03/03 14:41:29 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 3 at term 54
raft2024/03/03 14:41:29 INFO: 4 received MsgVoteResp from 4 at term 54
raft2024/03/03 14:41:29 INFO: 4 has received 1 MsgVoteResp votes and 0 vote rejections
- at this moment the server 4 dont receive any snapshot from the leader , so it tends to start the election.
- even restarting server 4 doesn't help (see below logs.)
kill server 4 and restart it again
2024/03/03 14:42:56 replaying WAL of member 4
2024/03/03 14:42:56 loading WAL at term 6 and index 11
raft2024/03/03 14:42:56 INFO: 4 switched to configuration voters=(1 2 3 4)
raft2024/03/03 14:42:56 INFO: 4 became follower at term 103
raft2024/03/03 14:42:56 INFO: newRaft 4 [peers: [1,2,3,4], term: 103, commit: 11, applied: 11, lastindex: 12, lastterm: 6]
2024/03/03 14:42:56 loading snapshot at term 51 and index 15
2024/03/03 14:42:56 ----------------------------------------------------------
2024/03/03 14:42:56 {"ID":4,"term":103,"vote":4,"commit":11,"Lead":0,"RaftState":"StateFollower","Applied":11,"LeadTransferee":0}
2024/03/03 14:42:56 rc.raftStorage.FirstIndex() is 12 the rc.raftStorage.LastIndex() 12
raft2024/03/03 14:42:57 INFO: 4 is starting a new election at term 103
raft2024/03/03 14:42:57 INFO: 4 became candidate at term 104
raft2024/03/03 14:42:57 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 1 at term 104
raft2024/03/03 14:42:57 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 2 at term 104
raft2024/03/03 14:42:57 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 3 at term 104
raft2024/03/03 14:42:58 INFO: 4 received MsgVoteResp from 4 at term 104
raft2024/03/03 14:42:58 INFO: 4 has received 1 MsgVoteResp votes and 0 vote rejections
2024/03/03 14:42:58 ----------------------------------------------------------
2024/03/03 14:42:58 {"ID":4,"term":104,"vote":4,"commit":11,"Lead":0,"RaftState":"StateCandidate","Applied":11,"LeadTransferee":0}
2024/03/03 14:42:58 rc.raftStorage.FirstIndex() is 12 the rc.raftStorage.LastIndex() 12
raft2024/03/03 14:42:58 INFO: 4 is starting a new election at term 104
raft2024/03/03 14:42:58 INFO: 4 became candidate at term 105
raft2024/03/03 14:42:58 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 1 at term 105
raft2024/03/03 14:42:58 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 2 at term 105
raft2024/03/03 14:42:58 INFO: 4 [logterm: 6, index: 12] sent MsgVote request to 3 at term 105
raft2024/03/03 14:42:58 INFO: 4 received MsgVoteResp from 4 at term 105
raft2024/03/03 14:42:58 INFO: 4 has received 1 MsgVoteResp votes and 0 vote rejections
2024/03/03 14:43:00 ----------------------------------------------------------
Observations
- now server 4 cannot join the cluster back due to this.
- removing the server 4 from the cluster and adding back immediately makes the leader to send another snapshot.