Callaghan-Hattingh / haystackdb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HaystackDB

Minimal but performant Vector DB

Features

  • Binary embeddings by default (soon int8 reranking)
  • JSON filtering for queries
  • Scalable, distributed architecture for use with multi replica deployments
  • Durable (WAL), persistent data, mem mapped for fast access in the client

Benchmarks

On a MacBook with an M2, 1024 dimension, binary quantized FAISS is using a flat index, so brute force, but it's in memory. Haystack is storing the data on disk, and also brute forces. TLDR is Haystack is ~10x faster despite being stored on disk.

100,000 Vectors
Haystack — 3.44ms
FAISS    — 29.67ms

500,000 Vectors
Haystack — 11.98ms
FAISS    - 146.50ms

1,000,000 Vectors
Haystack — 22.65ms
FAISS    — 293.91ms

Roadmap

  • Int8 reranking
  • Better queries with more than simple equality
  • Full text search
  • Better insertion performance with batch B+Tree insertion
  • Point in time backups/rollback
  • Cursor based pagination
  • Schema migrations

About

License:Apache License 2.0


Languages

Language:Rust 99.6%Language:Dockerfile 0.4%