SIFT Image Indexer

The purpose of this program is to provide a miniaturized SIFT-based Image Retrieval system for the purposes of image similarity metrics given a dataset of existing images and a query image.

Quick Start

Prepare a folder images in the repo directory with all the images you would like to train on.
Ensure MySQL server is up and running. Execute the file schemas.sql.
Initiate a Python environment and install requirements.txt.
Modify the file dataset.py with your MySQL login information and run it.
Run main.py and choose the query image from the window that appears. The image ID in the output corresponds to the image ID in the DB.

Motivation

Image similarity needs to be performed frequently on both an expanding and contracting image dataset. It is desirable to have the convenience of organizing, adding, and deleting image entries in a relational database while being able to quickly perform one of the most accurate ways to measure image similarity.

How It Works

Building the dataset

An existing dataset of images is loaded into memory. Each image is resized into a square thumbnail.
Obtain descriptors for each image using the SIFT function.
Store the descriptor as pickled data in a MySQL blob.

Building the Indexer

Upon initialization of the program, an empty FLANN Indexer was instantiated.
All images from the DB are downloaded onto memory, un-pickled, and fed into the indexer.
The Indexer is trained.

Querying the Indexer

Load a query image into memory and resize it to the specified thumbnail size.
Obtain descriptors for the query image using the SIFT function.
Use KNN matching the query image with k=2 on the Indexer.
Perform the ratio test on the resulting feature matches.
Each prominent feature match has an associated image index from the DB. Determine the image(s) with the most amount of features matched by tallying the results up.
Use proportioned sigmoid function to convert number of matched features into similarity percentage.

Limitations

Testing, investigation, and debugging was done on a laptop using a 1500-image dataset.

With a thumbnail size of 128x128:

Building the DB took five minutes,
Building the indexer took one second, and
Matching a query image took two seconds.

With a thumbnail size of 512x512:

Building the DB took five minutes,
Building the indexer took eight seconds, and
Matching a query image took two minutes.

Results will vary. If indexer building and/or matching takes too long, consider decreasing the thumbnail size.

tonellotto / SIFT-Image-Indexer