Potential Discrepancy in Cosine Similarity Calculation in HasNeighbors.php
dolcedev opened this issue · comments
I was reviewing the pgvector-php documentation and noticed that for calculating cosine similarity, it suggests using the formula 1 - cosine distance
, as demonstrated in the following SQL snippet:
SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;
However, upon examining the implementation within HasNeighbors.php, I could not find this formula being applied or any related implementation for cosine similarity. I am not an expert in vector distances, but based on the documentation, I believe this might lead to incorrect results when trying to utilize cosine similarity measures within the library. Is this correct?
Thank you in advance and great work!
Hi @dolcedev, nearestNeighbors
adds a neighbor_distance
attribute for the distance, so you'll need to do 1 - neighbor_distance
to get the similarity.