rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Home Page:https://docs.rapids.ai/api/raft/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] raft-ann-bench.run stuck after sweep in search mode

mikepcw opened this issue · comments

Describe the bug
The bench seems to get stuck with very low CPU util, and zero GPU util. The output from the bench script is shown in attached image.
Killed after more than 24 hrs. Attempting to run the .data_export stage fails, presumably because the results are incomplete.

Steps/Code to reproduce bug
python -m raft-ann-bench.run --dataset wiki_all_1M --dataset-path /ai-dataset/wiki-all/ --algorithms raft_cagra --batch-size 10000 -k 10

Expected behavior
The benchmark to return, and be able to complete the .data_export step.

Environment details (please complete the following information):

  • Environment location: bare metal H100 SXM4, Debian 6.1.76-1 (2024-02-01) x86_64
  • Method of RAFT install: conda, Python 3.10

Additional context
How long is the raft-ann-bench.run script with base search set on wiki_all_1M expected to run for?
Screenshot from 2024-04-09 17-50-45