Simple command line tool for finding and deleting duplicate files.
Fork of lionbee/godedupe
By default deduplicate
does a dry run.
./deduplicate [options] directory
-c Print duplicate values as a CSV to the console
-d Delete all duplicate values
Calling without any options does a dry run and lists the files to be deleted
./deduplicate directory
This is the default mode. Files that would be deleted are printer to stdout.
./deduplicate -c directory
A CSV is printed to stdout in the format: original, duplicate
./deduplicate -d directory
Duplicate files are deleted and the name of the deleted files are printed to stdout.
deduplicate
works by walking the supplied directory. Files that have the same number of bytes are MD5 hashed. If there is a hash colission there is potentionally a duplicate file. To confirm the file is in fact a duplicate the files are compared byte by byte before proceding.