yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimizer: "Cannot optimize the number of edges"

fonspa opened this issue · comments

Hi,
I'm trying to use the Optimizer class in the Python binding, for a version of NGT which I built from sources (v1.14.1) but I encounter a bug which seems to be an off-by-one error.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_28461/550346330.py in <module>
      1 optimizer = ngtpy.Optimizer(log_disabled = True)
----> 2 optimizer.optimize_number_of_edges_for_anng(os.path.join(export_path, "ngt-opt"))

RuntimeError: /usr/local/include/NGT/GraphOptimizer.h:505: Optimizer::optimizeNumberOfEdgesForANNG: Cannot optimize the number of edges. 0:0.937933 # of objects=5676310

The dataset I use has 5676309 data points, and the GraphOptimizer seems to want to work on point 5676310.

Minimal code to reproduce:

ngtpy.create(os.path.join(export_path, "opt"), 128, distance_type='L2', object_type='Float16')
index = ngtpy.Index(path=os.path.join(export_path, "opt"), read_only=False, zero_based_numbering=True, log_disabled=False)

for vec in xb:
    index.insert(vec)
index.save()
index.close()

optimizer = ngtpy.Optimizer(log_disabled = True)
optimizer.optimize_number_of_edges_for_anng(os.path.join(export_path, "opt"))

Am I doing something wrong here ?
Thank you !

Since the message shows the number of inserted objects + 1 by mistake, I don't think that it is an off-by-one error. When you specify 'Float' to the object type, do you get the same error?

Good call, in 'Float' there's no error and the optimization process seems to work fine.

I also had a segmentation fault during the optimize_search_parameters on the same index, maybe it's related ? I'm rebuilding the index right now, I'll keep you updated if the search parameters optimization goes fine in fp32.
Update: the search parameters optimization is OK and way faster in fp32 (and doesn't end up in a segfault anyway).

So there could be a problem with fp16 and the Optimizer ?
Concerning the fp16, according to your experience, is there a tangible benefit in terms of performances to work in fp16 instead of fp32 ?
I didn't encounter yet a database where fp16 was faster than fp32.

Thank you very much for your help and your time!

I found bugs of the optimizer with fp16 from your help. I am going to release a new version to fix them soon.

The current fp16 version mainly reduces its memory foot print. If you use a computer with AVX and build NGT on it, the search time is also shorten a little. However, when you use the ngt python package from pypi, the search time might increase, because AVX is disabled for the python packages.

Hi, that's great, thank you for working on a fix! I will test again when it's released.

I built the NGT libraries from sources (on a CPU with AVX2) and built the python bindings from the python/ directory, by following the wiki, I believe I should get the benefits of AVX2 extensions even when building an index from Python code.

I have just made a new release v1.14.2 with bug fixes of the optimizer relating to FP16.

I closed this issue for now. Feel free to reopen this whenever you find any issue.