pkolaczk / fclones

Efficient Duplicate File Finder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance really bad on mergerfs with btrfs backends

Motophan opened this issue · comments

I have three raid5 arrays

/mnt/btrfs01
/mnt/btrfs02
/mnt/btrfs03

all merged into
/mnt/mergerfs

each btrfs pool is -d raid5 -m raid1c3
system is on a sas12 backplane and I have 1.2GB/s reads during scrubs
CPU is 24C48T epyc cpu
It took 4 days to run through 30TB of files and all files > 100MB (mkv video files)

~/.cache/fclones does not exist so if I ever needed to do this again I would have to run this for another 4 days.

Is there any way to increase performance?

[2023-08-10 06:44:23.498] fclones:  info: Started grouping
[2023-08-10 06:45:57.553] fclones:  info: Scanned 529585 file entries
[2023-08-10 06:45:57.555] fclones:  info: Found 138413 (265.8 TB) files matching selection criteria
[2023-08-10 06:45:57.669] fclones:  info: Found 13785 (19.6 TB) candidates after grouping by size
[2023-08-10 06:45:57.673] fclones:  info: Found 13785 (19.6 TB) candidates after grouping by paths
[2023-08-10 06:45:57.674] fclones: warn: File system unknown on device default doesn't support FIEMAP ioctl API. This is generally harmless, but random access performance might be decreased because fclones can't determine physical on-disk location of file data needed for reading files in the optimal order.
[2023-08-12 07:08:15.359] fclones:  info: Found 6861 (16.7 TB) candidates after grouping by prefix
[2023-08-12 07:18:08.663] fclones:  info: Found 6836 (16.6 TB) candidates after grouping by suffix

You can enable caching. You have to use --cache option explicitly. This feature is turned off by default.
Another idea is to play with threading options - maybe the settings automatically selected by fclones are not optimal for your setup. See Tuning in README.md.

I ran this again and performance was fine
I think this was user error