rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Home Page:https://docs.rapids.ai/api/raft/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] run wiki_all_88m on NV A100 with raft-ann-bench will crash

ftian1 opened this issue · comments

Describe the bug
it will raise below error on NV A100 GPU.

raft_cagra.graph_degree32.intermediate_graph_degree32.graph_build_algoNN_DESCENT/process_time/real_time ERROR OCCURRED: 'Failed to create an algo: std::bad_alloc: out_of_memory: RMM failure at:/sparse/miniconda3/envs/py310/include/rmm/mr/device/pool_memory_resource.hpp:313: Maximum pool size exceeded'

Steps/Code to reproduce bug

python -m raft-ann-bench.run --dataset wiki_all_88M --dataset-path ./ --algorithms raft_cagra --build

Expected behavior
run benchmark succeed

Environment details (please complete the following information):
Bare-metal installation on Ubuntu
Raft was installed by conda install -c rapidsai -c conda-forge raft-ann-bench-gpu

I saw this error when I used conda install.
And when I turn to use docker container: https://docs.rapids.ai/api/raft/stable/raft_ann_benchmarks/#docker , the issue disappears.