ROCm / rccl-tests

RCCL Performance Benchmark Tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Test HIP failure common.cu:1129 'hipErrorInvalidDevice' while running test

sheetalarkadam opened this issue · comments

When I was testing rccl installation using the rccl-tests I get Test HIP failure common.cu:1129 'hipErrorInvalidDevice'

My configurations:
Device: AMD] Starship/Matisse Reserved SPP
Distribution: "Ubuntu 20.04.4 LTS"

Repro steps:
export HCC_AMDGPU_TARGET=gfx90a
cd home
git clone https://github.com/ROCmSoftwarePlatform/rccl-tests.git
cd rccl-tests && make
./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8

Any idea on what could be wrong with my environment? I was able to run the tests last week.

I really appreciate any help!

Setting --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --cap-add=SYS_PTRACE to docker run command solved the issue