eagledot / hachi

An end to end semantic and meta-data search engine for personal data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hachi

An end to end semantic and meta-data search engine for personal data.

Screenshots:

query

image_card

More screenshots:

Indexing screenshot indexing

Features:

  • Semantic + Meta-data Search: Query data using Natural language and/or Meta-data attributes. Combine any attributes to enable complex queries.
  • End-2-end interface: Index any media just by providing a path to a local directory/folder and start querying. No complex configurations.
  • Face-recognition: Face Detection and Recognition.
  • Fast: Start Getting Results in milliseconds. All Indices are stored on user's system.
  • Minimal Requirements: Any consumer Grade CPU with AVX2 instructions enabled and minimal software dependencies. (No dependence on deep-learning frameworks like pytorch/tensorflow.)
  • Private: Fully self hosted on users' system with no dependence on outside Network in any manner.

Hardware requirements:

Intel-64/AMD CPU with AVX2 instructions enabled.

Software requirements:

  • Python3 with pip installed (tested with >= 3.8.x)

  • Caddy: Open Source Web server. Download from here

Supported OS:

  • Windows (Tested on 10/11).
  • Linux (GlibC >= 2.27) [ Run command ldd --version to check glibc version.]

Install (one-time process):

  1. Install Caddy (Open source webserver, For serving static files and out of the box HTTPS configuration, if needed.)

  2. Python 3 (Tested with versions >= 3.8)

  3. Download source.zip from latest release or git clone https://github.com/eagledot/hachi.git

  4. cd into the cloned/downloaded repository. ( i.e change path to the root of cloned repository)

  5. Collect Model Weights by downloading data.zip from releases, extract/collect 2 .bin files from it into the path ./data , such that now ./data directory has 3 .bin files in it.

  6. Run command pip install -r requirements.txt ( This would install opencv-python, numpy, flask, regex, ftfy, plum-py python packages, if not found .)

    Extra steps (for Linux distributions Only.)

    1. Run command conda install -c conda-forge onednn-cpu-omp=2.6.0 ( conda is the most sane way i could find to install onednn shared library, without getting frustrated due to GlibC mismatching.)

      • update the LD_LIBRARY_PATH to make dynamic linker search for shared objects in the Conda path (if not already done!)
    2. Install openblas if not already included/installed with your OS.

      • sudo dnf install openblas-devel (Fedora)
      • sudo apt-get install openblas-dev (Ubuntu/Debian)

Usage:

  1. cd into the downloaded directory.
  2. Run command: caddy run
  3. Run command: python semantic_search.py OR python3 semantic_search.py
  4. Visit http://localhost:5000

Development:

Front-End:

Front-end code of the app lies in ./static/ directory and is generated automatically based on the svelte components in ./hachi_frontend directory. Front-end development requires Node(tested with v18.13.0) to be installed on user's machine.

Checkout readme.md in ./hachi_frontend for more details.

References/Resources:

Extra Details:

For Windows, shared libraries dnnl.dll and dnnl_v3.dll are included in this repository and are based on the ONEDNN Project. Specifically dnnl.dll corresponds to a version >= 2.6.x but less than version 3.x.x and dnnl_v3.dll corresponds to the version >= 3.x.x but less than 4.x.x. Developers can choose to build their own corresponding DLLs based on the instructions on the project page, provided they name it as dnnl.dll and dnnl_v3.dll after building.

Openblas.dll included along is based on the project https://github.com/OpenMathLib/OpenBLAS/ and as an alternative can be built from scratch or can be downloaded directly from releases .

RAM usage

Server hovers at 1100 Mb of RAM usage, which also includes around 650 Mb usage by CLIP Machine-learning model. In future, idea is to use ``image-encoder`` only during indexing, which should save us about 350 Mb RAM usage.

FAQs:

What is Hachi ?

Hachi is an end to end semantic and meta-data search engine for personal data.

end to end: It takes care of embeddings generation, meta-data extraction, storage, and retrieval without any intervention for data in a directory pointed to by user. It doesn't modify original data in any form.

Semantic: Understands natural language query.

meta-data: Extracts possible meta-data like filename, directory, available exif-data for a resource like an image.

Search: Provides an unified interface to allow search using semantic and/or meta-data attributes, hence allowing complex queries.

About

An end to end semantic and meta-data search engine for personal data.

License:GNU Affero General Public License v3.0


Languages

Language:Python 58.5%Language:Svelte 34.5%Language:Nim 3.3%Language:HTML 3.0%Language:JavaScript 0.3%Language:TypeScript 0.3%Language:CSS 0.0%