postgresml / postgresml

The GPU-powered AI application database. Get your app to market faster using the simplicity of SQL and the latest NLP, ML + LLM models.

Home Page:https://postgresml.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pgvector missing in Docker image ghcr.io/postgresml/postgresml:2.8.2

remote4me opened this issue · comments

I am trying to use the docker image. My environment: Ubuntu 22.04 with GPU, docker

I got these errors:

ERROR: access method "ivfflat" does not exist (when creating index)
ERROR: type "vector" does not exist (when using ::vector in select statemen)

What I did:

docker run --rm -it \
-v postgresml_data:/var/lib/postgresml \
-v postgresml_postgresdata:/var/lib/postgresql \
--gpus all \
-p 5499:5432 -p 8000:8000 \
ghcr.io/postgresml/postgresml:2.8.2 \
sudo -u postgresml psql -d postgresml
  1. Connected with SQL client to port 5499

  2. I want to reproduce steps described in "Vector database", see https://github.com/postgresml/postgresml/?tab=readme-ov-file#vector-database

  3. SELECT pgml.load_dataset('tweet_eval', 'sentiment');

  4. Created table with embeddings:

CREATE TABLE tweet_embeddings AS
SELECT text, pgml.embed('distilbert-base-uncased', text) AS embedding 
FROM pgml.tweet_eval;
  1. Creating index fails:
CREATE INDEX ON tweet_embeddings USING ivfflat (embedding vector_cosine_ops);
--
ERROR: access method "ivfflat" does not exist
1 statement failed.
  1. Using ::vector fails:
WITH query AS (
    SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney')::vector AS embedding
)
SELECT * FROM items, query ORDER BY items.embedding <-> query.embedding LIMIT 5;
--
ERROR: type "vector" does not exist
  Position: 113

    SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney')::vector AS embedding
                                                                                              ^
1 statement failed.
  1. Some additional info:
SELECT extname, extversion FROM pg_extension;
 extname | extversion 
---------+------------
 plpgsql | 1.0
 pgml    | 2.8.2
  1. More details:
SELECT pgml.version();
version
2.8.2 (dd7c74909bdf10cd5d39faf4429df8ba9748fd30)

Documentation (see https://postgresml.org/docs/product/vector-database) say literally this:

If you're using our Cloud or our Docker image, your database has pgvector installed already.

Well... I am using your latest Docker image, and...

CREATE EXTENSION vector;

this however leads to:

postgresml=# \d+ tweet_embeddings
                                      Table "public.tweet_embeddings"
  Column   |  Type  | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-----------+--------+-----------+----------+---------+----------+-------------+--------------+-------------
 text      | text   |           |          |         | extended |             |              |
 embedding | real[] |           |          |         | extended |             |              |
Access method: heap

postgresml=# select * from tweet_embeddings;
postgresml=# CREATE INDEX ON tweet_embeddings USING ivfflat (embedding vector_cosine_ops);
ERROR:  operator class "vector_cosine_ops" does not accept data type real[]

You'll need to alter the column type to a vector to use pgvector indexes.

https://github.com/pgvector/pgvector