spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.

Home Page:https://spotify.github.io/voyager/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Corrupted or unsupported index after saving.

janfait opened this issue · comments

Hello, stuck with the below. Would appreciate any tips.

My vectors look like this:

[[7.91172300e-01 6.69090297e-01 2.91000000e+02]
 [6.11795087e-01 3.69995315e-01 8.11000000e+02]
 [6.12826115e-01 3.79121037e-01 6.68000000e+02]
 [4.94505465e-01 3.66105550e-01 1.79000000e+02]
 [8.57812207e-01 3.69706741e-01 2.87000000e+02]
 [4.87957676e-01 3.83922704e-01 1.90000000e+02]
 [5.79707092e-01 5.88521933e-01 8.22000000e+02]
 [8.77284651e-01 3.60034340e-01 3.27000000e+02]
 [6.96175913e-01 4.77069307e-01 2.67000000e+02]
 [8.37530029e-01 6.95131995e-01 7.31000000e+02]]

Building and saving my index with this process works nicely.

    df = pd.read_csv(input_csv)
    vectors = df[['Size', 'Gps', 'CategoryCluster']].values
    ids = df['Id'].tolist()
    index = Index(Space.Euclidean, num_dimensions=vectors.shape[1])

    index.add_items(vectors,ids)
    
    #test that the index works
    queries = index.get_vectors([884])
    neighbors, distances = index.query(queries, k=5)
    print(neighbors)
    print(distances)

    index.save(index_path)

The below data is returned from prints. All good.

[[ 884 556793 524883 662437 529508]]
[[0. 0.0011078 0.00121032 0.00268939 0.00401055]]

When trying to read the index for later use with:

index = Index.load(index_path)

I get:
RuntimeError: Index seems to be corrupted or unsupported. Advancing to the next linked list requires 13312 additional bytes (from position 129997), but index data only has 130147 bytes in total.
It is not clear to me where to start with debugging. Do you have any tips on what could be wrong here?

I am on Windows 10 Pro
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz, 2301 MHz
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32

I was able to get it running in Docker so I assume it was related to my operating system. Closing

For anyone struggling here as @janfait:
try to open() it as 'rb' , then it works for me just fine even in Windows 10 Pro without Docker (Python 3.9 at least)

with open('my_index.voy', 'rb') as f:
    index = Index.load(f) 

For anyone struggling here as @janfait: try to open() it as 'rb' , then it works for me just fine even in Windows 10 Pro without Docker (Python 3.9 at least)

with open('my_index.voy', 'rb') as f:
    index = Index.load(f) 

very thanks