pkolaczk / fclones

Efficient Duplicate File Finder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Option to skip full checking (maybe extended checksums)

johnpyp opened this issue · comments

I'm de-duplicating thousands of large files (3-15GB each), and assuming the size, first and last checksum match, there's a very high probability that the file is the same.

It only takes ~1min to get past first/last checksum, but would take hours to get through the scan.

Could fclones provide an option to stop there and finish? And/or, could there be a "random sample" approach taken where files of matching size deterministically hash say another 5 ranges of their contents to increase confidence, but without nearing the demand of reading the entire file?