aws / aws-ofi-nccl

This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to use this over EFA?

VoVAllen opened this issue · comments

Hi,

I'm wondering whether it's possible for this to utilize elastic fabric adapter.

Thanks

Yes, it is possible and the README mentions that the plugin has been tested with EFA provider. Here is a getting started doc on EFA over P3dn which uses the plugin: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html and https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-efa-using.html

Hi,

Is there any code I can refer to that uses libfabric RDM endpoint primitive? (i.e. fi_send) Also I found there's code in libfabric repo about FI_HMEM. Does it mean I can receive data using efa to a registered cuda memory directly with something like fi_mr and fi_msg?

This plugin repository uses fi_tsend for libfabric RDM endpoints. Yes, with FI_HMEM, you can use cuda buffers directly. This plugin also has code for that.