dmacvicar / iu

images indexer/searcher

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

iu

Build

"iu" is an experiment that started with this tweet.

The goal is to do research around a tool to index and searching your image collection.

"iu" is not intended for productive use, and perhaps will never be.

The name comes from "mu", which is a mail indexer that inspired this project. "mu" means maildir utils, so I guess "iu" means "image utils".

What should "iu" be?

  • Just a command line tool
  • Targetted to the average person collecing lot of photos over the years
  • Basic integration with other tools eg. query search results opening in some album viewer
  • Reasonable fast indexing when re-indexing from scratch
  • Very fast indexing when a couple of new photos are added to the collection
  • Some basic features when indexing:
    • Camera model
    • Date
    • Album (?)
  • Some fancy indexing features I expect to add at some point:
    • Offline reverse geo-location: Turn GPS data into places names
    • Offline automatic tagging: Recognize basic entities (food, guitars, animals, bikes, cars, colors) and index on the object word
    • Search similar images, to detect duplicates while sorting my collection
    • Find images with low quality, to be used when curating my camera inbox
    • OCR, index on words in the image
    • Recognize people and index them
  • Ultra fast searching

Building from source

You need:

Once you satisfy those requirements

cmake -S . -B build
cmake --build build

or

$ cd build
cmake ..
make

Running

Getting data files

cmake --build build --target data

or..

$ cd build
$ make data

Indexing images

$ cd build
$ src/iu index --root ~/Pictures
...
indexed: 15465 files

Search

$ cd build
$ src/iu find "camera:powershot"
8725 result found
0: docid /home/foo/1.jpg
...
real    0m0.013s
user    0m0.008s
sys     0m0.005s

Performance

  • Without many optimizations, I can index 15k files (50G) in 2.7s on a old X230 laptop with SSD (libexif backend).
  • Adding offline geolocation over 121k places brings that up to 16s.

Implementation Notes

Technologies

  • Indexing is built on top of Xapian, a free and open-source probabilistic information retrieval library.

The idea of using [SQLite] was considered too.

  • Metadata from photos is retrieved using libexif.

    exiv2 was tested and while the API and format coverage was wider, it was much slower.

  • Examination of images is done with the help of Open Computer Vision Library.

Reverse geocoding index

Uses data from reverse_geocode, which is turn, comes from geonames.org. CC-By licence.

It is a dumb search by distance and it is not optimized yet.

Right now the technique is that we convert the photo location into a label (place name) and add this name to the index as a term. Therefore the place is passed into the query.

An alternative approach I am exploring is to allow to pass the place as part of a command line, separate from the query, and use Xapian geospatial (ie. LatLongDistancePostingSource), adding this posting source to the query object.

I will start this exploration by adding the location as a value to the document.

Automatic labeling

Uses Berkeley Vision and Learning Center Caffe GoogleNet model, and the word list from ImageNet.

I would still like to allow to drop models and labels list in a directory and have the indexer pick it up automatically.

Quality classification

Uses the BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator), a No Reference Image Quality Assessment (NR-IQA) algorithm as in implemented in OpenCV contrib.

We use the trained model provided in the /samples/ directory, trained on the LIVE-R2 database as in the original implementation.

Right now we don't do anything with this except of adding the word "blurry" to the index. In theory I should add this as a value.

Browsing photos

Right now if you add "-b" (browse) to a search, it will pass the list of files in the result to eog. This does not work well, as there is a limit on the number of files, and if there are no results, eog will still show other files. I am looking for a good replacement.

Hopefully I don't need to write my own.

License

  • (C)2020 Duncan Mac-Vicar P.

  • "iu" is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

  • "iu" is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

About

images indexer/searcher

License:Other


Languages

Language:C++ 74.7%Language:CMake 21.8%Language:Python 3.6%