Panic in `become_leader`
agourlay opened this issue · comments
raft-rs-version: 52d84aa
I am starting with a question because it is not easy to provide a reproducer for the time being.
Using a cluster with 3 peers, the following panic occurs under heavy load.
thread 'consensus' panicked at 'assertion failed: `(left == right)`
left: `155`,
right: `154`', /home/agourlay/.cargo/git/checkouts/raft-rs-42b8049ef2e3af07/52d84aa/src/raft.rs:1202:9
stack backtrace:
0: 0x56087e587d7d - std::backtrace_rs::backtrace::libunwind::trace::h9135f25bc195152c
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x56087e587d7d - std::backtrace_rs::backtrace::trace_unsynchronized::h015ee85be510df51
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x56087e587d7d - std::sys_common::backtrace::_print_fmt::h5fad03caa9652a2c
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:66:5
3: 0x56087e587d7d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h2b42ca28d244e5c7
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:45:22
4: 0x56087e5acf1c - core::fmt::write::h401e827d053130ed
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/fmt/mod.rs:1198:17
5: 0x56087e581681 - std::io::Write::write_fmt::hffec93268f5cde32
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/io/mod.rs:1672:15
6: 0x56087e5893f5 - std::sys_common::backtrace::_print::h180c4c706ee1d3fb
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:48:5
7: 0x56087e5893f5 - std::sys_common::backtrace::print::hd0c35d18765761c9
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:35:9
8: 0x56087e5893f5 - std::panicking::default_hook::{{closure}}::h1f023310983bc730
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:295:22
9: 0x56087e589111 - std::panicking::default_hook::h188fec3334afd5be
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:314:9
10: 0x56087e589986 - std::panicking::rust_panic_with_hook::hf26e9d4f97b40096
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:698:17
11: 0x56087e589877 - std::panicking::begin_panic_handler::{{closure}}::hfab912107608087a
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:588:13
12: 0x56087e588274 - std::sys_common::backtrace::__rust_end_short_backtrace::h434b685ce8d9965b
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:138:18
13: 0x56087e5895a9 - rust_begin_unwind
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
14: 0x56087d0f2c43 - core::panicking::panic_fmt::ha6dc7f2ab2479463
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
15: 0x56087e5aa4b8 - core::panicking::assert_failed_inner::h433285798fdd5aeb
16: 0x56087d0680eb - core::panicking::assert_failed::hc830e320c964b264
17: 0x56087d379869 - raft::raft::Raft<T>::poll::h00c0abb9f815bf43
18: 0x56087d37c493 - raft::raft::Raft<T>::step::h86d99c406bd8aaab
19: 0x56087d2a73bf - raft::raw_node::RawNode<T>::step::h300cec76fa4ab1ae
20: 0x56087d44527f - qdrant::consensus::Consensus::start::h0f712d9f3cb035d1
21: 0x56087d4c3ef6 - std::sys_common::backtrace::__rust_begin_short_backtrace::h4043a30eb25da8e0
22: 0x56087d32d127 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdeb327707abece7c
23: 0x56087e58e243 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h56d5fc072706762b
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/boxed.rs:1935:9
24: 0x56087e58e243 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h41deef8e33b824bb
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/boxed.rs:1935:9
25: 0x56087e58e243 - std::sys::unix::thread::Thread::new::thread_start::ha6436304a1170bba
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys/unix/thread.rs:108:17
26: 0x7f61ce669b43 - start_thread
at ./nptl/./nptl/pthread_create.c:442:8
27: 0x7f61ce6fba00 - clone3
at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
28: 0x0 - <unknown>
I can see that several elections took place during that specific run.
By the end of if, the cluster is in an inconsistency state.
There is one leader, one follower and one candidate which is lagging behind (terms+commit) without making progress.
Do you maybe have some insights on how to avoid this situation?
Thank you!
Can you provide some logs about that node? Are all updates are persisted before sending out the persisted messages?
Thank you for the quick response!
Are all updates are persisted before sending out the persisted messages?
We have followed the example and documentation, so we are performing the actions in the same order.
Can you provide some logs about that node?
Here are the full raft::raft
logs for that node up to the panic.
[2022-10-06T13:57:52.403Z INFO raft::raft] switched to configuration, raft_id: 7781260778219341078, config: Configuration { voters: Configuration { incoming: Configuration { voters: {7781260778219341078} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }
[2022-10-06T13:57:52.403Z INFO raft::raft] became follower at term 0, raft_id: 7781260778219341078, term: 0
[2022-10-06T13:57:52.403Z INFO raft::raft] newRaft, raft_id: 7781260778219341078, peers: Configuration { incoming: Configuration { voters: {7781260778219341078} }, outgoing: Configuration { voters: {} } }, last term: 0, last index: 0, applied: 0, commit: 0, term: 0
[2022-10-06T13:57:54.907Z INFO raft::raft] starting a new election, raft_id: 7781260778219341078, term: 0
[2022-10-06T13:57:54.907Z INFO raft::raft] became candidate at term 1, raft_id: 7781260778219341078, term: 1
[2022-10-06T13:57:54.907Z INFO raft::raft] became leader at term 1, raft_id: 7781260778219341078, term: 1
[2022-10-06T13:57:55.444Z INFO raft::raft] switched to configuration, raft_id: 7781260778219341078, config: Configuration { voters: Configuration { incoming: Configuration { voters: {7781260778219341078} }, outgoing: Configuration { voters: {} } }, learners: {2188602116241467853}, learners_next: {}, auto_leave: false }
[2022-10-06T13:57:55.483Z INFO raft::raft] switched to configuration, raft_id: 7781260778219341078, config: Configuration { voters: Configuration { incoming: Configuration { voters: {7781260778219341078} }, outgoing: Configuration { voters: {} } }, learners: {9653204377197442480, 2188602116241467853}, learners_next: {}, auto_leave: false }
[2022-10-06T13:57:55.757Z INFO raft::raft] switched to configuration, raft_id: 7781260778219341078, config: Configuration { voters: Configuration { incoming: Configuration { voters: {9653204377197442480, 7781260778219341078} }, outgoing: Configuration { voters: {} } }, learners: {2188602116241467853}, learners_next: {}, auto_leave: false }
[2022-10-06T13:57:55.865Z INFO raft::raft] switched to configuration, raft_id: 7781260778219341078, config: Configuration { voters: Configuration { incoming: Configuration { voters: {9653204377197442480, 2188602116241467853, 7781260778219341078} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }
[2022-10-06T14:01:09.573Z INFO raft::raft] received a message with higher term from 2188602116241467853, raft_id: 7781260778219341078, msg type: MsgRequestVote, message_term: 2, term: 1, from: 2188602116241467853
[2022-10-06T14:01:09.573Z INFO raft::raft] became follower at term 2, raft_id: 7781260778219341078, term: 2
[2022-10-06T14:01:09.573Z INFO raft::raft] [logterm: 1, index: 155, vote: 0] rejected vote from 2188602116241467853 [logterm: 1, index: 97] at term 2, raft_id: 7781260778219341078, msg type: MsgRequestVote, term: 2, msg_index: 97, msg_term: 1, from: 2188602116241467853, vote: 0, log_index: 155, log_term: 1
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] ignored a message with lower term from 9653204377197442480, raft_id: 7781260778219341078, msg term: 1, msg type: MsgAppendResponse, term: 2, from: 9653204377197442480
[2022-10-06T14:01:09.600Z INFO raft::raft] received a message with higher term from 2188602116241467853, raft_id: 7781260778219341078, msg type: MsgRequestVote, message_term: 3, term: 2, from: 2188602116241467853
[2022-10-06T14:01:09.600Z INFO raft::raft] became follower at term 3, raft_id: 7781260778219341078, term: 3
[2022-10-06T14:01:09.600Z INFO raft::raft] [logterm: 1, index: 155, vote: 0] rejected vote from 2188602116241467853 [logterm: 1, index: 97] at term 3, raft_id: 7781260778219341078, msg type: MsgRequestVote, term: 3, msg_index: 97, msg_term: 1, from: 2188602116241467853, vote: 0, log_index: 155, log_term: 1
[2022-10-06T14:01:09.626Z INFO raft::raft] received a message with higher term from 9653204377197442480, raft_id: 7781260778219341078, msg type: MsgRequestVote, message_term: 4, term: 3, from: 9653204377197442480
[2022-10-06T14:01:09.626Z INFO raft::raft] became follower at term 4, raft_id: 7781260778219341078, term: 4
[2022-10-06T14:01:09.626Z INFO raft::raft] [logterm: 1, index: 155, vote: 0] rejected vote from 9653204377197442480 [logterm: 1, index: 154] at term 4, raft_id: 7781260778219341078, msg type: MsgRequestVote, term: 4, msg_index: 154, msg_term: 1, from: 9653204377197442480, vote: 0, log_index: 155, log_term: 1
[2022-10-06T14:01:09.651Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.652Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.652Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.652Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.652Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.653Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.653Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.653Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.653Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.653Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:09.653Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:11.975Z INFO raft::raft] starting a new election, raft_id: 7781260778219341078, term: 4
[2022-10-06T14:01:11.975Z INFO raft::raft] became candidate at term 5, raft_id: 7781260778219341078, term: 5
[2022-10-06T14:01:11.975Z INFO raft::raft] broadcasting vote request, raft_id: 7781260778219341078, to: [9653204377197442480, 2188602116241467853], log_index: 155, log_term: 1, term: 5, type: MsgRequestVote
[2022-10-06T14:01:14.399Z INFO raft::raft] starting a new election, raft_id: 7781260778219341078, term: 5
[2022-10-06T14:01:14.399Z INFO raft::raft] became candidate at term 6, raft_id: 7781260778219341078, term: 6
[2022-10-06T14:01:14.399Z INFO raft::raft] broadcasting vote request, raft_id: 7781260778219341078, to: [9653204377197442480, 2188602116241467853], log_index: 155, log_term: 1, term: 6, type: MsgRequestVote
[2022-10-06T14:01:14.721Z INFO raft::raft] ignored a message with lower term from 2188602116241467853, raft_id: 7781260778219341078, msg term: 5, msg type: MsgRequestVoteResponse, term: 6, from: 2188602116241467853
[2022-10-06T14:01:14.743Z INFO raft::raft] received votes response, raft_id: 7781260778219341078, term: 6, type: MsgRequestVoteResponse, approvals: 2, rejections: 0, from: 2188602116241467853, vote: true
thread 'consensus' panicked at 'assertion failed: `(left == right)`
left: `155`,
right: `154`', /home/agourlay/.cargo/git/checkouts/raft-rs-42b8049ef2e3af07/52d84aa/src/raft.rs:1202:9
stack backtrace:
0: 0x56087e587d7d - std::backtrace_rs::backtrace::libunwind::trace::h9135f25bc195152c
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x56087e587d7d - std::backtrace_rs::backtrace::trace_unsynchronized::h015ee85be510df51
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x56087e587d7d - std::sys_common::backtrace::_print_fmt::h5fad03caa9652a2c
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:66:5
3: 0x56087e587d7d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h2b42ca28d244e5c7
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:45:22
4: 0x56087e5acf1c - core::fmt::write::h401e827d053130ed
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/fmt/mod.rs:1198:17
5: 0x56087e581681 - std::io::Write::write_fmt::hffec93268f5cde32
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/io/mod.rs:1672:15
6: 0x56087e5893f5 - std::sys_common::backtrace::_print::h180c4c706ee1d3fb
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:48:5
7: 0x56087e5893f5 - std::sys_common::backtrace::print::hd0c35d18765761c9
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:35:9
8: 0x56087e5893f5 - std::panicking::default_hook::{{closure}}::h1f023310983bc730
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:295:22
9: 0x56087e589111 - std::panicking::default_hook::h188fec3334afd5be
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:314:9
10: 0x56087e589986 - std::panicking::rust_panic_with_hook::hf26e9d4f97b40096
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:698:17
11: 0x56087e589877 - std::panicking::begin_panic_handler::{{closure}}::hfab912107608087a
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:588:13
12: 0x56087e588274 - std::sys_common::backtrace::__rust_end_short_backtrace::h434b685ce8d9965b
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/backtrace.rs:138:18
13: 0x56087e5895a9 - rust_begin_unwind
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
14: 0x56087d0f2c43 - core::panicking::panic_fmt::ha6dc7f2ab2479463
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
15: 0x56087e5aa4b8 - core::panicking::assert_failed_inner::h433285798fdd5aeb
16: 0x56087d0680eb - core::panicking::assert_failed::hc830e320c964b264
17: 0x56087d379869 - raft::raft::Raft<T>::poll::h00c0abb9f815bf43
18: 0x56087d37c493 - raft::raft::Raft<T>::step::h86d99c406bd8aaab
19: 0x56087d2a73bf - raft::raw_node::RawNode<T>::step::h300cec76fa4ab1ae
20: 0x56087d44527f - qdrant::consensus::Consensus::start::h0f712d9f3cb035d1
21: 0x56087d4c3ef6 - std::sys_common::backtrace::__rust_begin_short_backtrace::h4043a30eb25da8e0
22: 0x56087d32d127 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdeb327707abece7c
23: 0x56087e58e243 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h56d5fc072706762b
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/boxed.rs:1935:9
24: 0x56087e58e243 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h41deef8e33b824bb
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/boxed.rs:1935:9
25: 0x56087e58e243 - std::sys::unix::thread::Thread::new::thread_start::ha6436304a1170bba
at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys/unix/thread.rs:108:17
26: 0x7f61ce669b43 - start_thread
at ./nptl/./nptl/pthread_create.c:442:8
27: 0x7f61ce6fba00 - clone3
at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
28: 0x0 - <unknown>
It seems to be a bug. If the follower finds conflict, it will reset persisted
. But if it starts campaign before accepting new leader's logs, persisted can be less than the last_index
. I think the assert can be removed or just reset persisted
to last_index
.
/cc @gengliqi
It seems to be a bug. If the follower finds conflict, it will reset
persisted
. But if it starts campaign before accepting new leader's logs, persisted can be less than thelast_index
. I think the assert can be removed or just resetpersisted
tolast_index
./cc @gengliqi
As the comments described.
Lines 1196 to 1202 in 36d3293
If a follower starts a campaign and becomes candidate, it must persist the raft log and then sends the vote request.
So after that, the persisted
should be equal to the last index.
I suspect that the reason why persisted
does not equal 155 is something wrong with the term
function in Storage
trait.
You can see the code below for more details.
Lines 512 to 538 in 36d3293
Could you please add some log in term
function in your implementation of Storage
trait to help us figure out this issue? @agourlay
The reason why I think it's due to conflict index is because the logs show that when it starts campaign, its last index and term remain the same before conflict.
[2022-10-06T14:01:09.653Z INFO raft::raft_log] found conflict at index 155, raft_id: 7781260778219341078, conflicting term: 4, existing term: 1, index: 155
[2022-10-06T14:01:11.975Z INFO raft::raft] broadcasting vote request, raft_id: 7781260778219341078, to: [9653204377197442480, 2188602116241467853], log_index: 155, log_term: 1, term: 5, type: MsgRequestVote
But you are correct that as the logs are overwritten immediately once conflicts are found, so the persisted == last_index
should always be true when becomes leader. Unless the Storage
trait is not implemented correctly. Besides the function term
you mentioned, entries
is also higly suspected.
There are many "found conflict" logs in the logs. Technically, if raft-rs is correct, then this can happen when
the message is malformed. The message is expected to satisfy the constraint that message.entries[0].index == message.index + 1. If message.entries[0].index <= message.index, then even if the conflict is found, the logs may not be rewritten, hence the contiguous error report. So either the message is modified or the entries API of Storage
trait is not correctly implemented, for example, querying entries start from 10, but return entries start from 9..
@gengliqi @BusyJay Thank you very much for your guidance, we did find a subtle bug in our storage implementation!
After fixing our bug, the panic does not occur again therefore there is no reason to remove this assertion which seems to enforce a valid invariant.
I am closing this issue, thank you again for your support 🙏
Maybe raft-rs should provide some conditional asserts to help check storage's implementation. 🤔