Do you like this project? We love getting a star ⭐ and shout-out 🗣️from you in return! 🤗

Community support: discord chatroom for discussions

AquilaDB

AquilaDB is a vector database to store Feature Vectors along with JSON Document Metadata. Do k-NN retrieval from anywhere, even from the darkest rifts of Aquila (in progress). It is dead simple to set up, language agnostic and drop in addition for your Machine Learning Applications. AquilaDB, as of current features is ready solution for Machine Learning engineers and Data scientists to build Neural Information Retrieval applications out of the box with minimal dependencies (visit wiki page for use case examples).

AquilaDB 1.0 release is a distant goal to achieve. Visit contribute section below to see detailed development plan and milestones. We make sure that each release and AquilaDB Master branch are stable with all features planned up to date. All new pull requests are made to develop branch. So, develop is the default and bleeding edge branch with all the latest updates.

Github, Docker Hub, Documentation (dedicated Wiki page)

Who is this for

If you are working on a data science project and need to store a hell lot of data and retrieve similar data based on some feature vector, this will be a useful tool to you, with extra benefits a real world web application needs.
Are you dealing with a lot of images and related metadata? Want to find the similar ones? You are at the right place.
If you are looking for a document database, this is not the right place for you.

Technology

AquilaDB is not built from scratch. Thanks to OSS community, it is based on a couple of cool open source projects out there. We took a couch and added some wheels and jetpacks to make it a super cool butt rest for Data Science Engineers. While CouchDB provides us network and scalability benefits, FAISS and Annoy provides superfast similarity search. Along with our peer management service, AquilaDB provides a unique solution.

Prerequisites

You need docker installed.

Usage

AquilaDB is quick to setup and run as docker a container. All you need to do is either build it from source or pull it from Docker hub.

Option 1: build from source

clone this repository
build image: docker build -t ammaorg/aquiladb:latest .

Option 2: pull from dockerhub

pull image: docker pull ammaorg/aquiladb:latest

Finally, deploy

deploy: docker run -d -i -p 50051:50051 -v "<local data persist directory>:/data" -t ammaorg/aquiladb:latest

Client SDKs

We currently have multiple client libraries in progress to abstract the communication between deployed AquilaDB and your applications.

Python

Node JS

AquilaDB exposes gRPC APIs for the clients. Which means, you can communicate directly to AquilaDB from your favourite language (API reference). Above clients makes use of that to abstract the communication details from end user. If you are familiar with gRPC and would like to contribute a new client library in any other language, please let us know. Protocol buffers API reference. Example usage of APIs in node js.

Benchmark

For benchmark results, visit https://aquiladb.xyz/docs/adb-benchmarks

Progress

This project is still under active development (pre-release). It can be used as a standalone database now. Peer manager is a work in progress, so networking capabilities are not available now. With release v1.0 we will release pre-optimized version of AquilaDB.

Contribute

We have prepared a document to get anyone interested to contribute, immediately started with AquilaDB.

Here is our high level release roadmap.

Learn

We have started meeting developers and do small talks on AquilaDB. Here are the slides that we use on those occasions: http://bit.ly/AquilaDB-slides

Video:

As of current AquilaDB release features, you can build Neural Information Retrieval applications out of the box without any external dependencies. Here are some useful links to learn more about it and start building:

These use case examples will give you an understanding of what is possible and what not: https://github.com/a-mma/AquilaDB/wiki
Microsoft published a paper and youtube video on this to onboard anyone interested:
- paper: https://www.microsoft.com/en-us/research/uploads/prod/2017/06/INR-061-Mitra-neuralir-intro.pdf
- video: https://www.youtube.com/watch?v=g1Pgo5yTIKg
Embeddings for Everything: Search in the Neural Network Era: https://www.youtube.com/watch?v=JGHVJXP9NHw
Autoencoders are one such deep learning algorithms that will help you to build semantic vectors - foundation for Neural Information retrieval. Here are some links to Autoencoders based IR:
Note that, the idea of information retrieval applies not only to text data but for any data. All you need to do is, encode any source datatype to a dense vector with deep neural networks.

License

Apache License 2.0 license file

created with ❤️ a-mma.indic (a_മ്മ)

About

The Redis of Machine Learning. Drop in solution for Neural Information Retrieval. Index latent vectors along with JSON metadata and do efficient k-NN search.

http://aquiladb.xyz

Apache License 2.0

Languages

Language:Python 58.5%Language:JavaScript 41.2%Language:Shell 0.2%