Why does running a group after a dedupe hash everything again?
patrickwolf opened this issue · comments
Running Group on 130TB takes ~2 days
fclones group /ex2/ --cache -s 1M -o /ex2/_Data/fclones.json -f json --exclude '/ex2/#snapshot/**' --exclude '/ex2/#recycle/**'
Running it again takes 15 minutes
fclones group /ex2/ --cache -s 1M -o /ex2/_Data/fclones.json -f json --exclude '/ex2/#snapshot/**' --exclude '/ex2/#recycle/**'
Doing deduplication takes 10 minutes
fclones dedupe --path '/ex2/Reviews/**' -o /ex2/_Data/fclones_dd.txt --priority least-recently-modified < /ex2/_Data/fclones.json
**Running group again takes 1+ day**
fclones group /ex2/ --cache -s 1M -o /ex2/_Data/fclones.json -f json --exclude '/ex2/#snapshot/**' --exclude '/ex2/#recycle/**'
Why is it that after a dedupe that files need to re-hashed? Running dedupe should have reduced the amount of files that are not the same instead of increasing it right?
Environment is Synology, BTRFS, 130TB RAID 5
On Linux fclones
did not restore the timestamps of deduped files. This means the cache - which among other information looks at the mtime - for these entries was invalidated. This should be fixed with #194.
Great thank you @th1000s ! I imagine your internal code change is the same as using the -P option on cp? ie "cp --relink=always -P"?
It is more like --preserve=timestamps
, -p
, or more practical -a
/ archive mode.
This should be fixed now in 0.31.0.