pgvector / pgvector-php

pgvector support for PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential Discrepancy in Cosine Similarity Calculation in HasNeighbors.php

dolcedev opened this issue · comments

I was reviewing the pgvector-php documentation and noticed that for calculating cosine similarity, it suggests using the formula 1 - cosine distance, as demonstrated in the following SQL snippet:

SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;

However, upon examining the implementation within HasNeighbors.php, I could not find this formula being applied or any related implementation for cosine similarity. I am not an expert in vector distances, but based on the documentation, I believe this might lead to incorrect results when trying to utilize cosine similarity measures within the library. Is this correct?

Thank you in advance and great work!

Hi @dolcedev, nearestNeighbors adds a neighbor_distance attribute for the distance, so you'll need to do 1 - neighbor_distance to get the similarity.