Low index performance after `clear()`
mz1979 opened this issue · comments
Mehdi Zhiri commented
Describe the bug
Inserting vectors is extremely slow when using non-contiguous keys (Python SDK).
Steps to reproduce
Run this code and it will test the index insertion for contiguous and non-contiguous keys:
from usearch.index import Index
from random import random
import numpy as np
vectors = np.random.rand(600000, 256)
keys = np.arange(len(vectors))
offset = 1_000_000
keys_non_contiguous = []
for u in range(0, len(vectors), 50000):
fileIndex = int(random()*10)
batch = int(random()*256)
batchIndex = int('0b' + bin(batch).removeprefix('0b').zfill(8) + '0'*32, 2)
keys_non_contiguous.extend([batchIndex + fileIndex * offset + u for u in range(50000)])
keys_non_contiguous = np.array(keys_non_contiguous)
index = Index(
ndim=256, # Define the number of dimensions in input vectors
metric='cos', # Choose 'l2sq', 'haversine' or other metric, default = 'ip'
dtype='f32', # Quantize to 'f16' or 'i8' if needed, default = 'f32'
connectivity=16, # How frequent should the connections in the graph be, optional
expansion_add=128, # Control the recall of indexing, optional
expansion_search=64 # Control the quality of search, optional
)
# This takes about 20 sec on a 32 vCPU machine
index.add(keys, vectors, log=True, copy=False)
index.clear()
# This takes about 1min15sec on a 32 vCPU machine
index.add(keys_non_contiguous, vectors, log=True, copy=False)
Expected behavior
Performance should match whether contiguous or non-contiguous keys.
USearch version
Build from source branch main-dev
Operating System
Ubuntu 24.04 LTS
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
No response
Are you open to being tagged as a contributor?
- I am open to being mentioned in the project
.git
history as a contributor
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct
Ash Vardanian commented
The problem is in clear()
! If you reinitialize the index variable with a new constructor it works just as fast. Neat finding! Will investigate.
Mehdi Zhiri commented