coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In-Place Reduction for NCCL

cat-state opened this issue · comments

NCCL supports all-reduce in place, however Comm::all_reduce takes in a &CudaSlice to read from and a &mut CudaSlice to write into, which doesn't allow in-place reduction.

Ah yeah I see that (cuda docs: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/colls.html#c.ncclReduce)

I think in this case due to rust's borrow rules it'd probably be easiest to just add Comm::all_reduce_in_place that takes a &mut CudaSlice. Fairly easy add if anyone wants to contribute a PR for this! Otherwise I can add later this week