aws-ofi-nccl makes unnecessary calls to ofi_iflush() when using the PSM3 transport.
mwheinz opened this issue · comments
On some hardware, even a simple tensorflow test can end up calling ofi_iflush() tens of thousands of times per rank this serves no benefit since PSM3 ensures that GPU buffers are kept in sync after each I/O. In addition, because ofi_iflush() calls ofi_nccl_gdr_flush_disable() on every invocation, and ofi_nccl_gdr_flush_disable() acquires a mutex on each invocation, this adds further drag on performance.
Could you please re-base your patch on current master and verify at your end before we do another round of review?
I'm dropping this for now. We have no evidence that it's causing an actual problem and I've been redirected to other tasks.