chexum / pymergedup

find files with duplicate content

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pymergedup

Find files with duplicate content.

You have seen a few of these scripts, I have written a few of these scripts -- find duplicated files when they have the same size, and go through all of them to find if they have a matching hash.

The main differences:

  • with minimal dependencies (Python 2.7 only)
  • don't go through ALL the files checking their sha256 - this makes it very slow if you have lots of different, but large files with similar sizes (dashcam outputs, filesystem images). Instead, check their first few bytes first to see if they match (if there is a possible match, of course check the hash of the full contents)
  • output the duplicates in a way that helps to replace them with hardlinks (i.e. shell ln commands)

About

find files with duplicate content

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 100.0%