Node bootstrap failed with error: "the topology coordinator rejected request to join the cluster: request canceled because some required nodes are dead""
timtimb0t opened this issue · comments
Packages
Scylla version: 2025.1.0~rc2-20250216.6ee17795783f
with build-id 8fc682bcfdf0a8cd9bc106a5ecaa68dce1c63ef6
Kernel Version: 6.8.0-1021-aws
Issue description
During disrupt_decommission_streaming_err nemesis coordinator node been chosen as target node for decommission. The decommission process started and coordinator node reported:
2025-02-18T15:53:18.936+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-1 !INFO | scylla[5530]: [shard 0: gms] raft_topology - coordinator is decommissioning and becomes a non-voter; giving up leadership
2025-02-18T15:53:18.936+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-1 !INFO | scylla[5530]: [shard 0: gms] raft_group0 - becoming a non-voter (my id = 04af9f5f-5f97-4eaa-960b-71703ffba331)...
2025-02-18T15:53:18.936+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-1 !INFO | scylla[5530]: [shard 0: gms] raft_group0 - became a non-voter.
2025-02-18T15:53:18.936+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-1 !INFO | scylla[5530]: [shard 0:strm] raft_group0 - losing leadership
at the same time node 2 reported that it gained the leadership:
< t:2025-02-18 15:53:19,368 f:db_log_reader.py l:125 c:sdcm.db_log_reader p:DEBUG > 2025-02-18T15:53:19.318+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0:strm] raft_group0 - gaining leadership
< t:2025-02-18 15:53:19,368 f:db_log_reader.py l:125 c:sdcm.db_log_reader p:DEBUG > 2025-02-18T15:53:19.319+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0:strm] raft_topology - start topology coordinator fiber
< t:2025-02-18 15:53:19,368 f:db_log_reader.py l:125 c:sdcm.db_log_reader p:DEBUG > 2025-02-18T15:53:19.319+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - updating topology state: Starting new topology coordinator a3eaf90e-c343-4846-abb5-6d712aef3519
Nemesis successfully interrupted the decommission process with rebooting the node and it returned to the cluster:
2025-02-18T14:13:11.818+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - coordinator accepted request to join, waiting for nodes [04af9f5f-5f97-4eaa-960b-71703ffba331] to be alive before responding and continuing
2025-02-18T14:13:12.428+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0:strm] group0_tombstone_gc_handler - Setting reconcile time to 1739887990 (min id=7c1e8116-ee02-11ef-4691-7c497b721f5a)
2025-02-18T14:13:12.428+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] gossip - InetAddress 04af9f5f-5f97-4eaa-960b-71703ffba331/2a05:d018:12e3:f000:e91:e111:135f:93fd is now UP, status = NORMAL
2025-02-18T14:13:12.428+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - nodes [04af9f5f-5f97-4eaa-960b-71703ffba331] are alive
But then coordinator node marked node that returned to the cluster as down:
2025-02-18T15:53:22.568+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] gossip - InetAddress 04af9f5f-5f97-4eaa-960b-71703ffba331/2a05:d018:12e3:f000:e91:e111:135f:93fd is now DOWN, status = shutdown
2025-02-18T15:53:22.568+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !ERR | scylla[5542]: [shard 0: gms] raft_topology - send_raft_topology_cmd(stream_ranges) failed with exception (node state is decommissioning): seastar::rpc::closed_error (connection is closed)
2025-02-18T15:53:22.568+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - start rolling back topology change
2025-02-18T15:53:22.568+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - rollback 04af9f5f-5f97-4eaa-960b-71703ffba331 after decommissioning failure, moving transition state to rollback to normal and setting cleanup flag
2025-02-18T15:53:22.569+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - updating topology state: rollback 04af9f5f-5f97-4eaa-960b-71703ffba331 after decommissioning failure, moving transition state to rollback to normal and setting cleanup flag
2025-02-18T15:53:22.569+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - entered `rollback to normal` transition state
2025-02-18T15:53:22.569+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0: gms] raft_topology - executing global topology command barrier_and_drain, excluded nodes: {}
2025-02-18T15:53:26.318+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 !INFO | scylla[5542]: [shard 0:main] raft_group_registry - marking Raft server 04af9f5f-5f97-4eaa-960b-71703ffba331 as dead for raft groups
At the same time, within this nemesis the new node was being added but bootstrap process failed with error:
2025-02-18T15:59:10.184+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-15 !ERR | scylla[5579]: [shard 0:main] init - Startup failed: std::runtime_error (the topology coordinator rejected request to join the cluster: request canceled because some required nodes are dead)
Impact
It seems that the node was lost from the cluster, with no possibility of adding a new one
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Cluster size: 12 nodes (i3en.2xlarge)
Scylla Nodes used in this run:
- parallel-topology-schema-changes-mu-db-node-9603c5ec-9 (3.8.90.244 | 2a05:d01c:0964:7d01:a280:6b51:4349:5d63) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-8 (35.176.254.136 | 2a05:d01c:0964:7d00:93a3:14bb:8761:8466) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-7 (18.171.207.44 | 2a05:d01c:0964:7d00:390d:3141:febc:9ca6) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-6 (34.250.253.228 | 2a05:d018:12e3:f002:9b03:a289:9188:121e) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-5 (34.242.117.26 | 2a05:d018:12e3:f002:3d33:8df7:1205:239b) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-4 (34.241.176.122 | 2a05:d018:12e3:f001:2a47:5be5:96b0:e220) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-3 (3.254.146.142 | 2a05:d018:12e3:f001:1083:0c03:af45:c941) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-2 (54.155.86.187 | 2a05:d018:12e3:f000:b330:96eb:6ad3:58f7) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-15 (54.217.10.35 | 2a05:d018:12e3:f000:fdb0:a7a2:746c:ae39) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-14 (34.249.104.146 | 2a05:d018:12e3:f002:8109:2732:9e81:efe4) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-13 (13.40.155.212 | 2a05:d01c:0964:7d02:e4d4:5718:a802:51f7) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-12 (18.169.104.21 | 2a05:d01c:0964:7d02:1581:463d:d5d1:4792) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-11 (18.169.133.149 | 2a05:d01c:0964:7d02:397f:8cbd:55e1:2103) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-10 (18.175.200.62 | 2a05:d01c:0964:7d01:0aef:1c67:a1a7:ea2a) (shards: 7)
- parallel-topology-schema-changes-mu-db-node-9603c5ec-1 (3.249.249.119 | 2a05:d018:12e3:f000:0e91:e111:135f:93fd) (shards: 7)
OS / Image: ami-089e047033a16995a ami-0c34f939e95d0c640
(aws: undefined_region)
Test: longevity-multidc-schema-topology-changes-12h-test
Test id: 9603c5ec-ad38-449a-aa85-b91ff235b5d8
Test name: scylla-2025.1/vnodes/tier1/longevity-multidc-schema-topology-changes-12h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs and commands
- Restore Monitor Stack command:
$ hydra investigate show-monitor 9603c5ec-ad38-449a-aa85-b91ff235b5d8
- Restore monitor on AWS instance using Jenkins job
- Show all stored logs command:
$ hydra investigate show-logs 9603c5ec-ad38-449a-aa85-b91ff235b5d8
Logs:
- parallel-topology-schema-changes-mu-db-node-9603c5ec-12 - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_140800/parallel-topology-schema-changes-mu-db-node-9603c5ec-12-9603c5ec.tar.zst
- parallel-topology-schema-changes-mu-db-node-9603c5ec-6 - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_140800/parallel-topology-schema-changes-mu-db-node-9603c5ec-6-9603c5ec.tar.zst
- parallel-topology-schema-changes-mu-db-node-9603c5ec-1 - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_140800/parallel-topology-schema-changes-mu-db-node-9603c5ec-1-9603c5ec.tar.zst
- db-cluster-9603c5ec.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_161942/db-cluster-9603c5ec.tar.zst
- sct-runner-events-9603c5ec.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_161942/sct-runner-events-9603c5ec.tar.zst
- sct-9603c5ec.log.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_161942/sct-9603c5ec.log.tar.zst
- loader-set-9603c5ec.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_161942/loader-set-9603c5ec.tar.zst
- monitor-set-9603c5ec.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_161942/monitor-set-9603c5ec.tar.zst
- ssl-conf-9603c5ec.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/20250218_161942/ssl-conf-9603c5ec.tar.zst
- builder-9603c5ec.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/9603c5ec-ad38-449a-aa85-b91ff235b5d8/upload_20250218_162133/builder-9603c5ec.log.tar.gz
some required nodes - were they?
@timtimb0t - what was the status of all other nodes at the same time?
@gleb-cloudius - can you please take a look?
node1 seams to be down. First of all there is no system.log in the parallel-topology-schema-changes-mu-db-node-9603c5ec-1/
which is AFAIK indicates that the node was down when logs were collected (and we need it since it has more info then messages.log). Second there are these messages in the node2 log:
Feb 18 15:56:51.784302 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 scylla[5542]: [shard 0: gms] gossip - Got shutdown message from 2a05:d018:12e3:f000:e91:e111:135f:93fd, received_generation=1739894152, local_generation=1739894152
Feb 18 15:56:51.784726 parallel-topology-schema-changes-mu-db-node-9603c5ec-2 scylla[5542]: [shard 0: gms] gossip - InetAddress 04af9f5f-5f97-4eaa-960b-71703ffba331/2a05:d018:12e3:f000:e91:e111:135f:93fd is now DOWN, status = shutdown
Cancellation happened at 15:59:08.207, so after that.
And third the last line of the log on node1 is:
2025-02-18T15:56:51.670+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-1 !NOTICE | syslog-ng[889]: syslog-ng shutting down; version='4.3.1'
Exactly the same time as node2 got shutdown message from it.
Some, probably unrelated, but still issues that I saw it that after reboot node1 did not manage to start Scylla right away. First attempt failed with:
2025-02-18T15:53:48.382+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-1 !ERR | scylla[836]: [shard 0:main] init - Startup failed: std::system_error (error system:99, posix_listen failed for address [2a05:d018:12e3:f000:e91:e111:135f:93fd]:9180: Cannot assign requested address)
Second is messages like:
2025-02-18T15:53:21.419+00:00 parallel-topology-schema-changes-mu-db-node-9603c5ec-1 !WARNING | scylla[5530]: [shard 0:strm] seastar - Too long queue accumulated for streaming (3072 tasks)
while streaming.
some required nodes - were they? @timtimb0t - what was the status of all other nodes at the same time?
All other nodes were UN
some required nodes - were they? @timtimb0t - what was the status of all other nodes at the same time?
All other nodes were UN
Except of 04af9f5f-5f97-4eaa-960b-71703ffba331. This one was pining for the fjords. From the sct-9603c5ec.log:
< t:2025-02-18 15:56:51,438 f:cluster.py l:1254 c:sdcm.cluster p:INFO > Node parallel-topology-schema-changes-mu-db-node-9603c5ec-1 [3.249.249.119 | 10.4.0.168 | 2a05:d018:12e3:f000:0e91:e111:135f:93fd] (dc name: eu-westscylla_node_west, rack: 1a) destroyed
some required nodes - were they? @timtimb0t - what was the status of all other nodes at the same time?
All other nodes were UN
Except of 04af9f5f-5f97-4eaa-960b-71703ffba331. This one was pining for the fjords. From the sct-9603c5ec.log:
< t:2025-02-18 15:56:51,438 f:cluster.py l:1254 c:sdcm.cluster p:INFO > Node parallel-topology-schema-changes-mu-db-node-9603c5ec-1 [3.249.249.119 | 10.4.0.168 | 2a05:d018:12e3:f000:0e91:e111:135f:93fd] (dc name: eu-westscylla_node_west, rack: 1a) destroyed
Yes, the first node is the problematic default coordinator node that was banned by new coordinator (node 2)
some required nodes - were they? @timtimb0t - what was the status of all other nodes at the same time?
All other nodes were UN
Except of 04af9f5f-5f97-4eaa-960b-71703ffba331. This one was pining for the fjords. From the sct-9603c5ec.log:
< t:2025-02-18 15:56:51,438 f:cluster.py l:1254 c:sdcm.cluster p:INFO > Node parallel-topology-schema-changes-mu-db-node-9603c5ec-1 [3.249.249.119 | 10.4.0.168 | 2a05:d018:12e3:f000:0e91:e111:135f:93fd] (dc name: eu-westscylla_node_west, rack: 1a) destroyed
Yes, the first node is the problematic default coordinator node that was banned by new coordinator (node 2)
I do not understand what you mean here. You have 14 nodes. 13 up 1 down. You want to bootstrap node 15 which fails because all nodes should be up. There is no bug here. This is expected behaviour.
@gleb-cloudius , the sequence was as follows:
- Coordinator node (node1) been chosen for decommission and lose leadership
- Node 2 gained the leadership
- Node1 interrupted the decommission process and returned to the cluster
- Coordinator node (node2) marked node1 as down despite it returned to the cluster
- New node adding process failed due to node1 been marked as down
As result cluster lose 1 node and never added new one.
@gleb-cloudius , the sequence was as follows:
1. Coordinator node (node1) been chosen for decommission and lose leadership 2. Node 2 gained the leadership 3. Node1 interrupted the decommission process and returned to the cluster 4. Coordinator node (node2) marked node1 as down despite it returned to the cluster 5. New node adding process failed due to node1 been marked as down
As result cluster lose 1 node and never added new one.
According to all the evidence here #22983 (comment) this is not what happened. At step 4 node1 is dead. In fact it sends shutdown message to node2. sct log shows it as destroyed as can be seen here #22983 (comment).