ybubnov / imagedup

A naive command-line tool to remove duplicated images using OpenCV

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Imagedup

A tool to manage image duplicates - program find image duplicates in a specified folder and removes them if necessary.

Installation

We recommend to use pyenv to manage necessary python version and poetry to manage dependencies installation:

% brew install pyenv pyenv-virtualenv

Then the process of configuring the environment looks like following:

% pyenv install 3.9.1
% pyenv virtualenv 3.9.1 imagedup
% pyenv activate imagedup

Then install necessary dependencies:

% pip install poetry
% poetry config virtualenvs.create false
% poetry install --with-root

Usage

You can run an imagedup command right from the repository root in the following way:

% python -m imagedup ./dataset

By default the tool does not delete files and simply prints the files to delete into the standard output. If you want to delete duplicates, consider calling the tool like following:

% python -m imagedup.shell ./dataset -q --rm

Analysis

The following image outlines how exactly the --min-score and --min-area parameters relate to the number of images being removed from the directory.

By default this tool guarantees removal of 50% of the images from a directory. Gris Search

About

A naive command-line tool to remove duplicated images using OpenCV


Languages

Language:Jupyter Notebook 90.0%Language:Python 10.0%