In-Place Reduction for NCCL

Question

In-Place Reduction for NCCL

cat-state opened this issue 9 days ago · comments

NCCL supports all-reduce in place, however Comm::all_reduce takes in a &CudaSlice to read from and a &mut CudaSlice to write into, which doesn't allow in-place reduction.

Corey Lowman · Answer 1 · Mon Jun 17 2024 22:04:28 GMT+0800 (China Standard Time)

Ah yeah I see that (cuda docs: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/colls.html#c.ncclReduce)

I think in this case due to rust's borrow rules it'd probably be easiest to just add Comm::all_reduce_in_place that takes a &mut CudaSlice. Fairly easy add if anyone wants to contribute a PR for this! Otherwise I can add later this week