[BUG] raft-ann-bench.run stuck after sweep in search mode

Question

[BUG] raft-ann-bench.run stuck after sweep in search mode

mikepcw opened this issue 2 months ago · comments

Describe the bug
The bench seems to get stuck with very low CPU util, and zero GPU util. The output from the bench script is shown in attached image.
Killed after more than 24 hrs. Attempting to run the .data_export stage fails, presumably because the results are incomplete.

Steps/Code to reproduce bug
python -m raft-ann-bench.run --dataset wiki_all_1M --dataset-path /ai-dataset/wiki-all/ --algorithms raft_cagra --batch-size 10000 -k 10

Expected behavior
The benchmark to return, and be able to complete the .data_export step.

Environment details (please complete the following information):

Environment location: bare metal H100 SXM4, Debian 6.1.76-1 (2024-02-01) x86_64
Method of RAFT install: conda, Python 3.10

Additional context
How long is the raft-ann-bench.run script with base search set on wiki_all_1M expected to run for?