ceph / ceph-container

Docker files and images to run Ceph in containers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bootstrap process hangs up for hours

cactus-ale opened this issue · comments

What happened:

After running the following bootstrap command:
cephadm bootstrap --mon-ip 192.168.1.1 --cluster-network 192.168.1.0/24
And looking at the logs with:
journalctl -f | grep -e ceph -e mon

I get stuck in the bootsrapping process for hours

Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit systemd-timesyncd.service is enabled and running
Repeating the final host check...
docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit systemd-timesyncd.service is enabled and running
Host looks OK
Cluster fsid: 125533a2-339a-11ee-8709-fbd7ec17aaf0
Verifying IP 192.168.1.1 port 3300 ...
Verifying IP 192.168.1.1 port 6789 ...
Mon IP `192.168.1.1` is in CIDR network `192.168.1.0/24`
Mon IP `192.168.1.1` is in CIDR network `192.168.1.0/24`
Pulling container image quay.io/ceph/ceph:v17...
Ceph version: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...

And get on the logs:

Aug 05 14:12:14 cactus-router systemd[1]: /etc/systemd/system/ceph-9ef7732a-3362-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:14 cactus-router systemd[1]: /etc/systemd/system/ceph-4c48ec0e-3393-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:14 cactus-router systemd[1]: /etc/systemd/system/ceph-2480dcfa-3392-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-9ef7732a-3362-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-4c48ec0e-3393-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-2480dcfa-3392-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-9ef7732a-3362-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-4c48ec0e-3393-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-2480dcfa-3392-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: Created slice Slice /system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0.
Aug 05 14:12:15 cactus-router systemd[1]: Started Ceph mon.cactus-router for 125533a2-339a-11ee-8709-fbd7ec17aaf0.
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.919+0000 7f361db0db80  0 set uid:gid to 167:167 (ceph:ceph)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.919+0000 7f361db0db80  0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process ceph-mon, pid 7
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb: SST files in /var/lib/ceph/mon/ceph-cactus-router/store.db dir, Total Num: 0, files:
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb: Write Ahead Log file in /var/lib/ceph/mon/ceph-cactus-router/store.db: 000004.log size: 823 ;
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb:                                 Options.wal_dir: /var/lib/ceph/mon/ceph-cactus-router/store.db
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb: [db/version_set.cc:4724] Recovering from manifest file: /var/lib/ceph/mon/ceph-cactus-router/store.db/MANIFEST-000003
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.931+0000 7f361db0db80  4 rocksdb: [db/version_set.cc:4764] Recovered from manifest file:/var/lib/ceph/mon/ceph-cactus-router/store.db/MANIFEST-000003 succeeded,manifest_file_number is 3, next_file_number is 5, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.935+0000 7f361db0db80  4 rocksdb: [file/delete_scheduler.cc:69] Deleted file /var/lib/ceph/mon/ceph-cactus-router/store.db/000004.log immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  0 starting mon.cactus-router rank 0 at public addrs [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] at bind addrs [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] mon_data /var/lib/ceph/mon/ceph-cactus-router fsid 125533a2-339a-11ee-8709-fbd7ec17aaf0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  1 mon.cactus-router@-1(???) e0 preinit fsid 125533a2-339a-11ee-8709-fbd7ec17aaf0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  0 mon.cactus-router@-1(probing) e0  my rank is now 0 (was -1)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  1 mon.cactus-router@0(probing) e0 win_standalone_election
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  0 log_channel(cluster) log [INF] : mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 create_pending setting backfillfull_ratio = 0.9
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 create_pending setting full_ratio = 0.95
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 create_pending setting nearfull_ratio = 0.85
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 do_prune osdmap full prune enabled
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 encode_pending skipping prime_pg_temp; mapping job did not start
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader) e0 _apply_compatset_features enabling new quorum features: compat={},rocompat={},incompat={4=support erasure code pools,5=new-style osdmap encoding,6=support isa/lrc erasure code,7=support shec erasure code}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).paxosservice(auth 0..0) refresh upgraded, format 3 -> 0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(probing) e1 win_standalone_election
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  0 log_channel(cluster) log [INF] : mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  0 log_channel(cluster) log [DBG] : monmap e1: 1 mons at {cactus-router=[v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0]} removed_ranks: {}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mgrc update_daemon_metadata mon.cactus-router metadata {addrs=[v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0],arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable),ceph_version_short=17.2.6,compression_algorithms=none, snappy, zlib, zstd, lz4,container_hostname=cactus-router,container_image=quay.io/ceph/ceph:v17,cpu=Intel(R) Core(TM) i5-4430S CPU @ 2.70GHz,device_ids=sda=ATA_SAMSUNG_MZ7TY256_S307NWAH817321,device_paths=sda=/dev/disk/by-path/pci-0000:00:1f.2-ata-1,devices=sda,distro=centos,distro_description=CentOS Stream 8,distro_version=8,hostname=cactus-router,kernel_description=#85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023,kernel_version=5.15.0-78-generic,mem_swap_kb=4194300,mem_total_kb=7942232,os=Linux}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 create_pending setting backfillfull_ratio = 0.9
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 create_pending setting full_ratio = 0.95
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 create_pending setting nearfull_ratio = 0.85
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 do_prune osdmap full prune enabled
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 encode_pending skipping prime_pg_temp; mapping job did not start
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader) e1 _apply_compatset_features enabling new quorum features: compat={},rocompat={},incompat={8=support monmap features,9=luminous ondisk layout,10=mimic ondisk layout,11=nautilus ondisk layout,12=octopus ondisk layout,13=pacific ondisk layout,14=quincy ondisk layout}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).mds e1 new map
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).mds e1 print_map
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).paxosservice(auth 0..0) refresh upgraded, format 3 -> 0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 _set_cache_ratios kv ratio 0.25 inc ratio 0.375 full ratio 0.375
equires
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).osd e1 crush map has features 288514050185494528, adjusting msgr requires
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).osd e1 crush map has features 288514050185494528, adjusting msgr requires
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.955+0000 7f360e562700  1 mon.cactus-router@0(leader).paxosservice(auth 1..1) refresh upgraded, format 0 -> 3
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.955+0000 7f360e562700  0 log_channel(cluster) log [DBG] : mgrmap e1: no daemons active
Aug 05 14:12:15 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.947488+0000 mon.cactus-router (mon.0) 1 : cluster [INF] mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.953648+0000 mon.cactus-router (mon.0) 2 : cluster [INF] mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.953954+0000 mon.cactus-router (mon.0) 3 : cluster [DBG] monmap e1: 1 mons at {cactus-router=[v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0]} removed_ranks: {}
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.957073+0000 mon.cactus-router (mon.0) 4 : cluster [DBG] fsmap
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.960709+0000 mon.cactus-router (mon.0) 5 : cluster [DBG] osdmap e1: 0 total, 0 up, 0 in
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.961244+0000 mon.cactus-router (mon.0) 6 : cluster [DBG] mgrmap e1: no daemons active
Aug 05 14:12:20 cactus-router bash[64494]: debug 2023-08-05T14:12:20.939+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1019970681 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408
Aug 05 14:12:25 cactus-router bash[64494]: debug 2023-08-05T14:12:25.955+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1020053908 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408
Aug 05 14:12:30 cactus-router bash[64494]: debug 2023-08-05T14:12:30.955+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1020054723 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408
Aug 05 14:12:35 cactus-router bash[64494]: debug 2023-08-05T14:12:35.959+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408

With the last lines which look similar are repeated indefensibly.

What you expected to happen:

The bootstrap should complete in a few minutes

Environment:

  • OS: Ubuntu 22.04.2 LTS
  • Kernel: Linux 5.15.0-78-generic 85-Ubuntu SMP x86_64
  • Docker version: 20.10.21
  • Ceph version: 17.2.6