pgvector / pgvector

Open-source vector similarity search for Postgres

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Similarity ordering for operators

ivoras opened this issue · comments

Reading the docs, I'd concluded that similarity searches should be done like this, for different operators:

SELECT field FROM table ORDER BY vec <-> '[1,2,3]' LIMIT 10; -- for L2 distance
SELECT field FROM table ORDER BY vec <#> '[1,2,3]' DESC LIMIT 10; -- for inner product
SELECT field FROM table ORDER BY 1-(vec <=> '[1,2,3]') LIMIT 10; -- for cos distance

But, actually doing the queries (with the HNSW index), the second and the third form yield bad results. They return good results if run without special care, i.e.:

SELECT field FROM table ORDER BY vec <#> '[1,2,3]' LIMIT 10; -- for inner product
SELECT field FROM table ORDER BY vec <=> '[1,2,3]' LIMIT 10; -- for cos distance

Am I misunderstanding how to use the operators?

The 2nd and 3rd operations - both representing "further distance from a query vector" or "furthest neighbors," are currently not supported as indexable operations. Currently the main focus for ANN indexing methods focus on nearest neighbors, not furthest neighbors.

Is there a particular use case you're trying to solve with looking for furthest neighbors? How large is the dataset that you're using, and how big are the vectors?

Thank you for the quick answer. I was comparing the quality of results of different approaches with the goal to implement a similarity search. Vectors are 768-dimensional, and there are approx. half a million of them.

What's puzzling me is that just using order by vec <#> '[1,2,3'] (and similarly for <=>) appears to work - they do produce results which looks like "similar" vectors when converted back to text. Am I wrong to interpret the concept of "furthest neighbors" to just "inverse distance ordering" (i.e. to convert furthest neighbors to nearest neighbours, just add "ORDER BY...DESC")?

Hi @ivoras, to get the nearest neighbors, you should order by just the operator (in ascending order, which is the default).

ORDER BY vec <-> '[1,2,3]'
ORDER BY vec <#> '[1,2,3]'
ORDER BY vec <=> '[1,2,3]'