spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.

Home Page:https://spotify.github.io/voyager/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fewer than expected results were retrieved during querying the index

sametdumankaya opened this issue · comments

Hi,

I'm trying to use voyager library instead of annoy but encountered with the following problem. Even though there are 25130 elements (see the num_elements attribute of the index below) in the Voyager Index, I'm unable to query since it can't find all of the indexes somehow.

image

@sametdumankaya sorry about the delayed response here! Can you provide some more information on your use case? Are you attempting to query for N neighbors where N is the number of elements in the index?

Also can you check to ensure that there are no NaN's in your item set?

Hello @markkohdev!
I am facing the exact same issue. Some calls for querying for N neighbors in an index of length N results in this error.
My objective would be to find the furthest neighbor in a index from a specific vector.
There are no NaN's in the set.

print(f"Len Index {len(cluster_index)}")
neighbors, _ = cluster_index.query(
            vectors=any_vector,
            k=len(cluster_index)
        )

outputs

Len Index 828
RuntimeError: Fewer than expected results were retrieved; only found 825 of 828 requested neighbors.

Is this a parameter tuning problem? Such as some of the "ef" parameters?

Please note that this index also does not contain any mark_deleted() elements

@sametdumankaya sorry about the delayed response here! Can you provide some more information on your use case? Are you attempting to query for N neighbors where N is the number of elements in the index?

Also can you check to ensure that there are no NaN's in your item set?

Hey again,

There's basically 25310 elements in the set and I'm trying to get similarity scores for all of them using a random embedding. I confirm that there's no NaN's in the item set. Somehow, 4 of the items were not included in the index and there are 25306 items in the index instead of 25310.