Enet4 / faiss-rs

Rust language bindings for Faiss

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

panicked at 'too large index value provided to Idx::new'

nemosupremo opened this issue · comments

When you use a large integer (greater than 2^63) as an ID, faiss-rs will panic. This is a bit unexpected as it's not mentioned anywhere in the documentation, nor do I think faiss has this limitation. If the intent is to represent null ids, isn't Option<faiss::idx_t> a better choice?

For what it's worth, this behavior is documented. Idx::new was designed to ensure that the returned index intendes to refer to an item on index, rather than null/void.

Panic

Panics if the ID is too large (>= 2^63)

One would use Idx::none to create a void index (internally represented by -1).

What one can do to improve this is:

  • Extend the documentation of these functions to further clarify that Idx::new really expects a non-negative idx.
  • Provide a new_unchecked variant which admits any idx value.

For what it's worth, this behavior is documented

My mistake I missed that

Idx::new was designed to ensure that the returned index intendes to refer to an item on index, rather than null/void.

I'm new to faiss so I could be misunderstanding something. Is it the case that faiss requires Idx to be non-negative? My use case is I'm adding vectors with add_with_ids, and the ids I'm using are completely arbitrary and random; so when trying to add a vector with a given id I can generate a negative idx. The thing is the "void index" thing seems to be faiss-rs thing, as I can't find any indication that you can't use a negative idx with faiss. If faiss allows you to use negative indexes then isn't an Option better here?

Just catching up with older issues.

Is it the case that faiss requires Idx to be non-negative?

That much is the assumption made in this API, but it does not go far off from what is expected to users of the native Faiss API. An index entry ID of -1 in a search result means an empty/missing entry, which can happen in some index implementations.

My use case is I'm adding vectors with add_with_ids, and the ids I'm using are completely arbitrary and random; so when trying to add a vector with a given id I can generate a negative idx. The thing is the "void index" thing seems to be faiss-rs thing, as I can't find any indication that you can't use a negative idx with faiss.

As explained above, this concept of empty ID is indeed in the native library. Even if this API were to be more relaxed to allow negative numbers, it should not allow an ID value of -1 because that collides with the semantic of a missing vector on the search result list. You would not be able to distinguish a search result item from being empty or from being the vector of ID -1. Sounds like a very nasty source of bugs to me.

All of this can be circumvented by converting an i64 into an Idx via From<i64>, but for this to be sound, you would need to adjust the ID generation logic in any case, so as to avoid generating -1.

If faiss allows you to use negative indexes then isn't an Option better here?

It would be an interesting design option, but without an integer type that is not allowed to be -1, that would bloat the size of the index type, making it terribly inefficient. Right now, idx_t values are mapped 1:1 to Idx without representation changes. Forcing us to output a vector of Option<Idx> as the result would require a copy onto a representation that can hold the 64-bit integer plus the Option discriminant.

An index entry ID of -1 in a search result means an empty/missing entry, which can happen in some index implementations.

I missed this; this would complicate things