dventimiglia / PGNNS

PostgreSQL Nearest Neighbor Search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pgnns

Quick Start

Download Embeddings

Download the pre-computed embeddings file embeddings.csv.gz from the following location on Google Drive.

https://drive.google.com/file/d/1XQa74OCJlun2e3CVbyhOuaMu7Mq6Ww3l/view?usp=sharing

Create Environment Script

Create the file setenv.sh in the following way. You're free to choose whatever values you like. Note the following.

  • SAMPLES is the number of (title, embedding) pairs to import into the database.
  • TRANSACTIONS is the actual number of benchmark operations to perform, referring to the number of queries to make.
  • APIKEY is the API Key for AnnDB, which also is being evaluated here.
  • DATASET is the name of a dataset in AnnDB. See the documentation there for more information.
export PGHOST=localhost
export PGPORT=6432
export PGDATABASE=pgbench
export PGUSER=pgbench
export PGPASSWORD=pgbench
export SAMPLES=10000000
export TRANSACTIONS=10
export APIKEY=<redacted>
export DATASET='wikipedia-titles'

Start PostgreSQL Database

In one terminal window, run the following command to run an instance of PostgreSQL in a Docker image, using the environment variables established in setenv.sh.

source setenv.sh && docker run -e POSTGRES_DB=$PGDATABASE -e POSTGRES_USER=$PGUSER -e POSTGRES_PASSWORD=$PGPASSWORD -p $PGPORT:5432 postgres:13.2

Run Benchmark

In another terminal window, run the following command to run the benchmark, using the same environment variables established in setenv.sh.

source setenv.sh && time make clean report.txt 

Setup

Install Conda

Create Conda Environment

Install Conda Packages

Set up Kaggle CLI

Create Environment Script

Start PostgreSQL Database

Run Benchmark

About

PostgreSQL Nearest Neighbor Search


Languages

Language:Makefile 100.0%