Method 1: MD5 - Find duplicate files
Method 2: VGG16 + Annoy - Find similar images using feature extraction
Use the package manager pip to install requirements.
pip3 install -r requirements.txt
methods: "md5" or "vgg"
img_dir: "root dir of images"
actions: "move" / "copy"
python3 main.py --method md5 --img_dir /home/image_dir/ --action copy
or
python3 main.py --method vgg --img_dir /home/image_dir/ --action copy
jupyter notebook
# if you are looking for exact match!
find-duplicate-files-md5.ipynb
# if you are looking for similarity!
find-similar-images-feature-extraction-annoy.ipynb
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Ubuntu 18.04 & python3.9.12
macOS Monterey 12.2.1 & python 3.9.12