minio / mc

Simple | Fast tool to manage MinIO clusters :cloud:

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mc admin heal doesn't work

AlexZIX opened this issue · comments

I've replaced one broken disk with the new one and its filling with data. In previous versions of MinIO I can reviewed the healing progress using mc admin heal but for now it shows me that no active healing in my cluster:

root@minio-cold-1:~# mc admin heal minio-cold
No active healing is detected for new disks.

But at the same time I see in my Grafana that healing are in progress:


So is this a bug or new version shouldn't show the healing status in console?

mc --version

root@minio-cold-1:~# mc --version
mc version RELEASE.2023-01-28T20-29-38Z (commit-id=2e95a70c98fb9c2629cd89817b8759bfa109a4d0)
Runtime: go1.19.4 linux/amd64

System information

Cluster: 4 nodes with 4 disks on each

root@minio-cold-1:~# uname -a
Linux minio-cold-1 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

So is this a bug or new version shouldn't show the healing status in console?

This is because healing.bin is missing for some reason, causing the healing state to be removed.

// cc @vadmeste this sounds something we have seen now elsewhere, can you investigate?

@AlexZIX newer versions do not show stats in prometheus anymore. Are you sure the disk healing did not finish ? can you check the disk usage (df -h) and compare it with other disks in the same erasure set ?

@vadmeste Healing should be in progress because replaced disk is still have only 10% of data:


One more question is why healing process too slow? I've replaced this disk week ago but it contains only 10% of data. If healing continues at the same speed then total recovering time will be 10 weeks or more that 2 months. Is that normal?

can you share all MinIO logs of node minio-cold-4 ?

Yes if you'll explain where I can find it r how to export it.

@AlexZIX it depends how you deployed MinIO. It is MinIO standard output. If it is bare-metal, most likley, journatlctl -u minio will show some logs. By the way are you using ILM expiry feature in this cluster ?

@vadmeste Output from journalctl attached.

If ILM means expiration of versioned files which was removed then my answer is yes - we use buckets with versioning enabled with expiration settings from removed objects.

This is df -h output which may helps too:

root@minio-cold-4:~# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 1.6G 1.8M 1.6G 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 17G 5.6G 11G 35% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda2 1.8G 252M 1.4G 16% /boot
/dev/sda1 952M 6.1M 946M 1% /boot/efi
hdd-pool-1 3.2T 157G 3.1T 5% /hdd-pools/hdd-pool-1
hdd-pool-4 3.2T 1.3T 1.9T 41% /hdd-pools/hdd-pool-4
hdd-pool-3 3.2T 1.4T 1.9T 42% /hdd-pools/hdd-pool-3
hdd-pool-2 3.2T 1.4T 1.9T 42% /hdd-pools/hdd-pool-2
tmpfs 1.6G 4.0K 1.6G 1% /run/user/0