aws / aws-ofi-nccl

This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it safe to use OFI_NCCL_GDR_FLUSH_DISABLE?

kwen2501 opened this issue · comments

Hi team,

The README provides this environment variable OFI_NCCL_GDR_FLUSH_DISABLE.

Is it safe to set it to 1? i.e. disable GDR flush for this plugin. Would setting it to 1 improve the performance?

Thank you!

Cc @rohan-varma @zhaojuanmao @pbelevich

Hey Ke,

We do not recommend disabling flush using the environment variable OFI_NCCL_GDR_FLUSH_DISABLE if you are using EFA libfabric provider (therefore, there is no point comparing performance). The configuration is provided for the libfabric networks that can guarantee data arrival at GPUs with network completions.

Hope that helps.

Thanks @rashikakheria for the clarification!