ROCm / rccl-tests

RCCL Performance Benchmark Tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-GPU Support with External Pinning

frobnitzem opened this issue · comments

In my HPC environment, srun accomplishes pinning of MPI ranks to specific cores and GPU-s (by setting ROCR_VISIBLE_DEVICES). However, this conflicts with rccl-tests, which tries to manually select GPUs based on the MPI rank.

I have fixed this in my own build (frobnitzem@5b347ee) by always running the step gpuid = gpuid % args->localNumDevices, regardless of whether args->enable_multiranks is true or not.

I suggest adopting this change, and reverting the update: d16d1fb which throws an error in this case instead.