[BUG] raft-ann-bench.run stuck after sweep in search mode
mikepcw opened this issue · comments
Describe the bug
The bench seems to get stuck with very low CPU util, and zero GPU util. The output from the bench script is shown in attached image.
Killed after more than 24 hrs. Attempting to run the .data_export
stage fails, presumably because the results are incomplete.
Steps/Code to reproduce bug
python -m raft-ann-bench.run --dataset wiki_all_1M --dataset-path /ai-dataset/wiki-all/ --algorithms raft_cagra --batch-size 10000 -k 10
Expected behavior
The benchmark to return, and be able to complete the .data_export
step.
Environment details (please complete the following information):
- Environment location: bare metal H100 SXM4, Debian 6.1.76-1 (2024-02-01) x86_64
- Method of RAFT install: conda, Python 3.10
Additional context
How long is the raft-ann-bench.run
script with base search set on wiki_all_1M
expected to run for?