Optimizer: "Cannot optimize the number of edges"

Question

Optimizer: "Cannot optimize the number of edges"

fonspa opened this issue 3 years ago · comments

Hi,
I'm trying to use the Optimizer class in the Python binding, for a version of NGT which I built from sources (v1.14.1) but I encounter a bug which seems to be an off-by-one error.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_28461/550346330.py in <module>
      1 optimizer = ngtpy.Optimizer(log_disabled = True)
----> 2 optimizer.optimize_number_of_edges_for_anng(os.path.join(export_path, "ngt-opt"))

RuntimeError: /usr/local/include/NGT/GraphOptimizer.h:505: Optimizer::optimizeNumberOfEdgesForANNG: Cannot optimize the number of edges. 0:0.937933 # of objects=5676310

The dataset I use has 5676309 data points, and the GraphOptimizer seems to want to work on point 5676310.

Minimal code to reproduce:

ngtpy.create(os.path.join(export_path, "opt"), 128, distance_type='L2', object_type='Float16')
index = ngtpy.Index(path=os.path.join(export_path, "opt"), read_only=False, zero_based_numbering=True, log_disabled=False)

for vec in xb:
    index.insert(vec)
index.save()
index.close()

optimizer = ngtpy.Optimizer(log_disabled = True)
optimizer.optimize_number_of_edges_for_anng(os.path.join(export_path, "opt"))

Am I doing something wrong here ?
Thank you !

Masajiro Iwasaki · Answer 1 · Thu Feb 24 2022 06:02:42 GMT+0800 (China Standard Time)

Since the message shows the number of inserted objects + 1 by mistake, I don't think that it is an off-by-one error. When you specify 'Float' to the object type, do you get the same error?

Alkonost · Answer 2 · Thu Feb 24 2022 18:54:02 GMT+0800 (China Standard Time)

Good call, in 'Float' there's no error and the optimization process seems to work fine.

I also had a segmentation fault during the optimize_search_parameters on the same index, maybe it's related ? I'm rebuilding the index right now, I'll keep you updated if the search parameters optimization goes fine in fp32.
Update: the search parameters optimization is OK and way faster in fp32 (and doesn't end up in a segfault anyway).

So there could be a problem with fp16 and the Optimizer ?
Concerning the fp16, according to your experience, is there a tangible benefit in terms of performances to work in fp16 instead of fp32 ?
I didn't encounter yet a database where fp16 was faster than fp32.

Thank you very much for your help and your time!

Masajiro Iwasaki · Answer 3 · Fri Feb 25 2022 09:02:51 GMT+0800 (China Standard Time)

I found bugs of the optimizer with fp16 from your help. I am going to release a new version to fix them soon.

The current fp16 version mainly reduces its memory foot print. If you use a computer with AVX and build NGT on it, the search time is also shorten a little. However, when you use the ngt python package from pypi, the search time might increase, because AVX is disabled for the python packages.

Alkonost · Answer 4 · Fri Feb 25 2022 16:58:54 GMT+0800 (China Standard Time)

Hi, that's great, thank you for working on a fix! I will test again when it's released.

I built the NGT libraries from sources (on a CPU with AVX2) and built the python bindings from the python/ directory, by following the wiki, I believe I should get the benefits of AVX2 extensions even when building an index from Python code.

Masajiro Iwasaki · Answer 5 · Tue Mar 01 2022 06:09:49 GMT+0800 (China Standard Time)

I have just made a new release v1.14.2 with bug fixes of the optimizer relating to FP16.

Masajiro Iwasaki · Answer 6 · Thu Apr 21 2022 06:05:44 GMT+0800 (China Standard Time)

I closed this issue for now. Feel free to reopen this whenever you find any issue.