aryamansharma01 / Duplicate-Images-Removal

Python project for hashing and image processing of given dataset for subtask #1 in selection round for ML (Application in Blogging Platform) in IOSD NSUT.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicate Images Removal

This project has made use of the Difference Hash function to carry out image processing, since it is one of the most effective hashing methods.
After hashes of every image in the dataset (which was downloaded from the google drive link to my local system), they were stored in an array and sorted.
The duplicate hashes from the list were separated and stored in another list, and one copy of every duplicate was added to the remaining list of the other hashes.
Thus, the final list contained only unique images, and the final number came out to be 6698, signalling that 7814 - 6698 = 1116 images were duplicate.

-PROJECT BY
Aryaman Sharma
2019UCO1508
COE-1 (3rd sem)

About

Python project for hashing and image processing of given dataset for subtask #1 in selection round for ML (Application in Blogging Platform) in IOSD NSUT.


Languages

Language:Jupyter Notebook 100.0%