wolny / phash-hierarchical-clustering

Hierarchical clustering of images using phash and Hamming distance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

phash-hierarchical-clustering

An app clusters a given set of images and displays results via a simple JavaFX GUI. First, Perceptual Hashing is used to map the images to binary feature vectors. Then Agglomerative Hierarchical Clustering with Hamming distance as a distance measure is used to group similar binary vectors.

Note: we use a low hard-coded cutHight value of 8.0 in order to cut the dendrogram tree into small clusters with low number of outliers. You might experiment with different values of cutHeight in the HCluster depending on your dataset size and required 'quality' of the clustering.

Running

Build the project with sbt assembly. This will generate a phash-hierarchical-clustering-assembly-<version>.jar uberjar file in the target/scala-<scalaVersion> subdirectory (where <version> is the current version defined in build.sbt).

Run the application from the .jar with the java -jar command, e.g.:

  • java -jar target/scala-2.12/phash-hierarchical-clustering-assembly-1.0.jar <imageDirectory> this might take a while the 1st time, since the app needs to compute the phash value for every image in the <imageDirectory>

<imageDirectory> is the folder where the images are stored (use as many images as possible for better results).

Sample results

  • Sample clusters from a dataset consisting of 5K images with Apple logo Cluster 1 Cluster 2 Cluster 3

  • A dendrogram illustrate the result of Hierarchical Clustering used with complete agglomeration method (see Smile docs for more details) Dendrogram

About

Hierarchical clustering of images using phash and Hamming distance

License:MIT License


Languages

Language:Scala 100.0%