Add support for printing SQL of SQLAlchemy query
sarimak opened this issue · comments
pgvector.sqlalchemy.Vector
doesn't provide a process_literal_param
method, so any SQLALchemy query containing literal values for such column will fail to be rendered to final SQL including the values when literal_binds
are enabled. Please implement the method.
Repro steps:
import sqlalchemy as sa
import sqlalchemy.dialects.postgresql
import pgvector.sqlalchemy as pgv
engine = sa.create_engine("postgresql+psycopg2://postgres:password@localhost:5432/neighbors-2")
conn = engine.connect()
table = sa.Table("foo", sa.MetaData(), sa.Column("col1", pgv.Vector(4)))
query = sa.select(table.c.col1).where((table.c.col1.max_inner_product([0.5, 0.5, 0.5, 0.5]) * -1) >= 0.1)
kwargs = {"literal_binds": True}
sql = str(query.compile(dialect=sqlalchemy.dialects.postgresql.dialect()), compile_kwargs=kwargs)
raises
sqlalchemy.exc.CompileError: No literal value renderer is available for literal value "[0.5, 0.5, 0.5, 0.5]" with datatype VECTOR(4)
(The query itself is OK which can be verified by using kwargs = {}
-- the compilation returns 'SELECT foo.col1 \nFROM foo \nWHERE (foo.col1 <#> %(col1_1)s) * %(param_1)s >= %(param_2)s'
)
Workaround:
import pgvector.utils
class Vector(pgv.Vector, sa.TypeDecorator):
impl = pgv.Vector
cache_ok = True
def process_literal_param(self, value, dialect):
return repr(pgvector.utils.to_db(value, self.dim))
and use this wrapper class in the table definition instead of the library-provided pgv.Vector
. The SQL rendering then works:
str(query.compile(dialect=sqlalchemy.dialects.postgresql.dialect(), compile_kwargs={"literal_binds": True}))
"SELECT foo.col1 \nFROM foo \nWHERE (foo.col1 <#> '[0.5,0.5,0.5,0.5]') * -1 >= 0.1"
Rationale: I wanted to have a test that makes sure the approximate similarity search is indeed using an index. So I needed to run EXPLAIN
over the search query, which needs plain SQL text of the SQLAlchemy query including the values for the prepared statement, so I can execute explain_query = sa.text(f"EXPLAIN {query}")
and assert for "Index Scan using foo" in "\n".join(row[0] for row in explain_query.fetchall())
.
I am aware that literal_binds
can't be used with untrusted inputs (SQL injection) -- but for tests with hard-coded/trusted inputs it is perfectly usable.
And given that it seems to be quite easy to stop using the approximate similarity index by seemingly innocent change of the SQLAlchemy query, I find such regression test valuable as a safety net for my colleagues/future self.
Hi @sarimak, thanks for the suggestion. Since it can't be used with untrusted input, I don't think it makes sense to include, but this issue should be helpful for users who want to add it manually.
Well, then for example Alembic's preview/SQL-only mode won't be able to leverage pgvector's syntax. The literal_binds
is an opt-in feature and it has legitimate use cases. But as you prefer -- I have this workaround, so my needs are already covered.