- Semantic + Meta-data Search: Query data using Natural language and/or Meta-data attributes. Combine any attributes to enable complex queries.
- End-2-end interface: Index any media just by providing a path to a local directory/folder and start querying. No complex configurations.
- Face-recognition: Face Detection and Recognition.
- Fast: Start Getting Results in milliseconds. All Indices are stored on user's system.
- Minimal Requirements: Any consumer Grade CPU with AVX2 instructions enabled and minimal software dependencies. (No dependence on deep-learning frameworks like
pytorch
/tensorflow
.) - Private: Fully self hosted on users' system with no dependence on outside Network in any manner.
Intel-64/AMD CPU with AVX2 instructions enabled.
-
Python3 with pip installed (tested with >= 3.8.x)
-
Caddy: Open Source Web server. Download from here
- Windows (Tested on 10/11).
- Linux (GlibC >= 2.27) [ Run command
ldd --version
to check glibc version.]
-
Install Caddy (Open source webserver, For serving static files and out of the box HTTPS configuration, if needed.)
-
Python 3 (Tested with versions >= 3.8)
-
Download
source.zip
from latest release orgit clone https://github.com/eagledot/hachi.git
-
cd
into the cloned/downloaded repository. ( i.e change path to the root of cloned repository) -
Collect Model Weights by downloading
data.zip
from releases,extract/collect
2.bin
files from it into the path./data
, such that now./data
directory has 3.bin
files in it. -
Run command
pip install -r requirements.txt
( This would installopencv-python
,numpy
,flask
,regex
,ftfy
,plum-py
python packages, if not found .)-
Run command
conda install -c conda-forge onednn-cpu-omp=2.6.0
( conda is the most sane way i could find to install onednn shared library, without getting frustrated due to GlibC mismatching.)- update the
LD_LIBRARY_PATH
to makedynamic linker
search forshared objects
in the Conda path (if not already done!)
- update the
-
Install
openblas
if not already included/installed with your OS.sudo dnf install openblas-devel
(Fedora)sudo apt-get install openblas-dev
(Ubuntu/Debian)
-
cd
into the downloaded directory.- Run command:
caddy run
- Run command:
python semantic_search.py
ORpython3 semantic_search.py
- Visit http://localhost:5000
Front-end code of the app lies in ./static/
directory and is generated automatically based on the svelte components in ./hachi_frontend
directory.
Front-end development requires Node
(tested with v18.13.0) to be installed on user's machine.
Checkout readme.md
in ./hachi_frontend
for more details.
- Machine learning model powering this webapp is based on CLIP architecture.
- https://gitlab.com/TNThieding/exif/ (exif data extraction)
- https://github.com/scardine/image_size (extract image meta-data with no dependencies.)
- https://www.geonames.org/ (geographical database allowing to implement a dependency free reverse geocoder for this project.)
For Windows, shared libraries dnnl.dll
and dnnl_v3.dll
are included in this repository and are based on the ONEDNN Project.
Specifically dnnl.dll
corresponds to a version >= 2.6.x
but less than version 3.x.x
and dnnl_v3.dll
corresponds to the version >= 3.x.x
but less than 4.x.x
.
Developers can choose to build their own corresponding DLLs
based on the instructions on the project page, provided they name it as dnnl.dll
and dnnl_v3.dll
after building.
Openblas.dll
included along is based on the project https://github.com/OpenMathLib/OpenBLAS/ and as an alternative can be built from scratch or can be downloaded directly from releases .
Server hovers at 1100 Mb of RAM usage, which also includes around 650 Mb usage by CLIP Machine-learning model. In future, idea is to use ``image-encoder`` only during indexing, which should save us about 350 Mb RAM usage.
What is Hachi ?
Hachi is an end to end semantic and meta-data search engine for personal data.
end to end: It takes care of embeddings generation, meta-data extraction, storage, and retrieval without any intervention for data in a directory pointed to by user. It doesn't modify original data in any form.
Semantic: Understands natural language query.
meta-data: Extracts possible meta-data like filename
, directory
, available exif-data
for a resource like an image.
Search: Provides an unified interface to allow search using semantic and/or meta-data attributes, hence allowing complex queries.