Codeczh / remove-duplicate

The image fingerprinting algorithm to remove duplicates of web set

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#remove-duplicate

#The image fingerprinting algorithm to remove duplicates of web set

Note: This code will delete repeat images!You may need to back up your data set.

If you want copy dataset, remember to uncomment the code in gather.py.

dataset =trainset[:trainset.rfind('\')+1]+'train_delete'
copy_data(trainset,dataset)

and comment

dataset =trainset

Then the train will be copied to train-delete


The directory path is as shown below:

 ---|------web-bird_ _ _ _ _ _ _ _ _ _train
    |                      | _ _ _ _ _val  
    |-----image-fingerprinting_ _ _ _gather.py  
                           |_ _ _ _ _move_repeat.py  
                           |_ _ _ _ _index.py  

Run gather.py to generate repeat folder, containing duplicate images, under web-bird. Run move_repeat.py to move those repeat images intra sub-classes to repeat_sub under web-bird as well.


reference:
cnblog中文翻译
web fine-grained 去重简介

About

The image fingerprinting algorithm to remove duplicates of web set


Languages

Language:Python 100.0%