pkolaczk / fclones

Efficient Duplicate File Finder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[dedup] Use APFS clone (CoW) on macOS

boozook opened this issue · comments

Bug, kind: critical, dedup-cmd broken on macOS.
Docs: man clonefile
Min macOS: 10.12 (Sierra)
FS required: APFS

TL;DR: Use cp -c instead of cp --reflink on macOS.

Explaination:

cp -c uses clonefile syscall, flag -c overrides default copy/duplicate behaviour and clone files via clonefile() instead — see man cp(1). The behaviour is identical to that of the Linux cp flag --reflink but in macOS cp have no --reflink parameter, so with --reflink you'll get an errors like cp: illegal option....


Update

Seems to there's not a problem with dedupe, but with help documentation and cli (interface).

  • fclones dedupe -h tells this:

--dry-run Don't perform any changes on the file-system, but writes a log of file operations to the standard output.

  • fclones dedupe --dry-run <some.txt prints commands like this:
mv /path/A /path/B
cp --reflink=always /path/C /path/A
rm /path/B

So I was sure that fclones without --dry-run will execute that commands.

wow - indeed dedupe on macOS does nothing... but interestingly does not throw any error neither.

It should be very easy PR to fix it

indeed dedupe on macOS does nothing...

I've run with dedupe --dry-run and inspect output.

but.... 4 x 10G identical files

and I run for real and inspect results

$ ls -lih *
272167869 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:21 10GB.bin
272168957 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.1
272168961 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.2
272168969 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:38 10GB.bin.3

$ df -h /System/Volumes/Data
Filesystem     Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s1  932Gi  541Gi  332Gi    62% 2404443 3484848560    0%   /System/Volumes/Data

$ fclones group . | fclones dedupe
...
[2023-08-07 22:40:01.028] fclones:  info: Processed 3 files and reclaimed up to 32.2 GB space

$ ls -lih *
272167869 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:21 10GB.bin
272169042 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.1
272169043 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.2
272169044 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:38 10GB.bin.3

$ df -h /System/Volumes/Data
Filesystem     Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s1  932Gi  511Gi  362Gi    59% 2404444 3799378320    0%   /System/Volumes/Data

there is no problem with (CoW) dedupe on APFS

there is problem what --dry-run shows

Bug, kind: critical, dedup-cmd broken on macOS.

can be downgraded from critical to cosmetic

So, how exactly works dedup on macOS? Hard links? Just for understanding.
If so, anyway better will be to use clones (copy-on-write) as described above.

You can see that files do not share the same inode so they are not hard links (this is why I added -i option to ls).

They are are proper CoW clones.

Great!
Well, so this can be closed or still open for docs/interface improvement.
Thank you so much!

Leave it open:) something is not 100% right - at least with --dry-run

I think I have identified the problem. PR already posted.