Corrupted or unsupported index after saving.
janfait opened this issue · comments
Hello, stuck with the below. Would appreciate any tips.
My vectors look like this:
[[7.91172300e-01 6.69090297e-01 2.91000000e+02]
[6.11795087e-01 3.69995315e-01 8.11000000e+02]
[6.12826115e-01 3.79121037e-01 6.68000000e+02]
[4.94505465e-01 3.66105550e-01 1.79000000e+02]
[8.57812207e-01 3.69706741e-01 2.87000000e+02]
[4.87957676e-01 3.83922704e-01 1.90000000e+02]
[5.79707092e-01 5.88521933e-01 8.22000000e+02]
[8.77284651e-01 3.60034340e-01 3.27000000e+02]
[6.96175913e-01 4.77069307e-01 2.67000000e+02]
[8.37530029e-01 6.95131995e-01 7.31000000e+02]]
Building and saving my index with this process works nicely.
df = pd.read_csv(input_csv)
vectors = df[['Size', 'Gps', 'CategoryCluster']].values
ids = df['Id'].tolist()
index = Index(Space.Euclidean, num_dimensions=vectors.shape[1])
index.add_items(vectors,ids)
#test that the index works
queries = index.get_vectors([884])
neighbors, distances = index.query(queries, k=5)
print(neighbors)
print(distances)
index.save(index_path)
The below data is returned from prints. All good.
[[ 884 556793 524883 662437 529508]]
[[0. 0.0011078 0.00121032 0.00268939 0.00401055]]
When trying to read the index for later use with:
index = Index.load(index_path)
I get:
RuntimeError: Index seems to be corrupted or unsupported. Advancing to the next linked list requires 13312 additional bytes (from position 129997), but index data only has 130147 bytes in total.
It is not clear to me where to start with debugging. Do you have any tips on what could be wrong here?
I am on Windows 10 Pro
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz, 2301 MHz
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
I was able to get it running in Docker so I assume it was related to my operating system. Closing