ofiwg / libfabric

Open Fabric Interfaces

Home Page:http://libfabric.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

prov/efa: unable to register more than 95GB of memory

jhh67 opened this issue · comments

Describe the bug
While developing the Chapel runtime for the EFA provider we encountered an error in which a single process cannot register more than 95GB of memory. 95GB succeeds, 96GB fails with the following error:

OFI error: fi_mr_reg(ofi_domain, memTab[i].addr, memTab[i].size, bufAcc, 0, (prov_key ? 0 : i), 0, &ofiMrTab[i], ((void*)0)): Cannot allocate memory

To Reproduce
We do not have a simple reproducer, we currently test using the full Chapel runtime. We observed the error on AWS c7i.48xlarge which has one EFA NIC and 384GB of memory.

Expected behavior
I expect to be able to register more than 25% of the physical memory of the machine.

Output
The output with FI_LOG_LEVEL=Debug contained:

libfabric:15409:1705531788::efa:mr:efa_mr_reg_impl():850<warn> Unable to register MR: Cannot allocate memory
libfabric:15409:1705531788::efa:mr:efa_mr_regattr():982<warn> Unable to register MR: Cannot allocate memory

Environment:
This is on an AWS c7i.48xlarge instance using libfabric 1.19, the efa provider, and export FI_EFA_USE_DEVICE_RDMA=1.

Additional context

What is the output of ulimit -l?

It's not a bug. EFA device has limit for the number of host pages that you can register. If you are currently allocating your memory with the regular page (4k), using huge page (can be 2M on some platform) can save the number of pages and allow you to register larger memory.

Thank you for your suggestions. We will try them and get back to you with the results.

We haven't had any luck registering more than 95GB of memory using hugepages. Can you provide some guidance on how to make this work? ulimit -l is unlimited so that isn't the issue. We tried using explicit hugepages using libhugetlbfs but encountered errors trying to register the memory:

internal error: 0: comm-ofi.c:2875: OFI error: fi_mr_reg(ofi_domain, memTab[i].addr, memTab[i].size, bufAcc, 0, (prov_key ? 0 : i), 0, &ofiMrTab[i], ((void*)0)): Bad address
internal error: 1: comm-ofi.c:2875: OFI error: fi_mr_reg(ofi_domain, memTab[i].addr, memTab[i].size, bufAcc, 0, (prov_key ? 0 : i), 0, &ofiMrTab[i], ((void*)0)): Bad address

We also tried using transparent 2MB hugepages and mmap with MAP_HUGETLB. Using this method we are sometimes able to register up to 155GB of memory, but not always. Is there documentation on getting the efa provider working using hugepages?

We also tried using transparent 2MB hugepages and mmap with MAP_HUGETLB.

I don't think EFA support transparent huge pages. If you have EFA installer installed on your instance, you should be able to see there are huge page reserved

(env) [ec2-user@ip-172-31-51-162 ~]$ cat /sys/kernel/mm/hugepages/**/nr_hugepages
0
14081

You can increase this count to allow larger size of huge page allocation

Libfabric uses this to allocate buffer from the huge page pool

*memptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
		MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);

And EFA provider allocates its internal buffer pool from the huge page pool by default. Did you use the same mmap call in your application to allocate huge page memory?