ofiwg / libfabric

Open Fabric Interfaces

Home Page:http://libfabric.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues while configuring EFA on Ubuntu 22.04

coderodyhpc opened this issue · comments

Hello,
The configuration of EFA fabrics on Ubuntu 22.04 is failing with the following message:

sudo ./configure --prefix=/opt/libfabric --enable-efa
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking how to create a pax tar archive... gnutar
checking whether make supports nested variables... (cached) yes
checking build system type... aarch64-unknown-linux-gnu
checking host system type... aarch64-unknown-linux-gnu
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking dependency style of gcc... gcc3
checking for ar... ar
checking the archiver (ar) interface... ar
checking for gcc... (cached) gcc
checking whether the compiler supports GNU C... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to enable C11 features... (cached) none needed
checking whether gcc understands -c and -o together... (cached) yes
checking dependency style of gcc... (cached) gcc3
checking for C compiler conformance level... c11
checking how to run the C preprocessor... gcc -E
checking for typeof syntax and keyword spelling... typeof
checking how to print strings... printf
checking for a sed that does not truncate output... /usr/bin/sed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert aarch64-unknown-linux-gnu file names to aarch64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert aarch64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /usr/bin/dd
checking how to truncate binary pipes... /usr/bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking for stdio.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for strings.h... yes
checking for sys/stat.h... yes
checking for sys/types.h... yes
checking for unistd.h... yes
checking for elf.h... yes
checking for sys/auxv.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
configure: creating ./config.lt
config.lt: creating libtool
checking for dlopen in -ldl... yes
checking for pthread_mutex_init in -lpthread... yes
checking for pthread_spin_init... yes
checking for shm_open... yes
checking for epoll_create... yes
checking for gcc options needed to detect all undeclared functions... none needed
checking for linux/perf_event.h... yes
checking whether __builtin_ia32_rdpmc is declared... no
checking compiler support for c11 atomics... yes
checking compiler support for c11 atomic `least` types... yes
checking compiler support for built-in atomics... yes
checking for library containing __atomic_load_8... -latomic
checking compiler support for built-in memory model aware atomics... yes
checking for __int128... yes
checking compiler support for built-in memory model aware 128-bit atomics... yes
checking compiler support for cpuid... no
checking whether ld accepts --version-script... yes
checking for .symver assembler support... yes
checking for __alias__ attribute support... yes
checking for getifaddrs... yes
checking ethtool support... yes
checking whether ethtool_cmd_speed is declared... yes
checking whether SPEED_UNKNOWN is declared... yes
checking for linux/userfaultfd.h... yes
checking whether __NR_userfaultfd is declared... yes
checking for userfaultfd unmap support... yes
checking for library containing clock_gettime... none required
checking for cuda_runtime.h... no
checking for level_zero/ze_api.h... no
checking for nrt/nrt.h... no
checking for habanalabs/synapse_api.h... no
checking for __curbrk... yes
checking for __clear_cache... yes
checking for linux/mman.h... yes
checking for sys/syscall.h... yes
checking whether __syscall is declared... no
checking for __syscall... no
checking for hsa/hsa_ext_amd.h... no
checking size of void *... 8
configure: *** Configuring psm provider
checking for psm.h... no
configure: psm provider: disabled
configure: *** Configuring psm2 provider
checking for psm2.h... no
configure: configure: recheck psm2 without psm2_info_query.
checking for psm2.h... no
configure: recheck psm2 without psm2_mq_ipeek_dequeue_multi.
checking for psm2.h... no
configure: recheck psm2 without psm2_mq_fp_msg.
checking for psm2.h... no
configure: recheck psm2 without psm2_am_register_handlers_2.
checking for psm2.h... no
configure: psm2 provider: disabled
configure: *** Configuring psm3 provider
checking for sys/mman.h... yes
configure: looking for library without search path
checking for shm_open in -lrt... yes
checking for numa.h... no
checking for infiniband/verbs.h... no
checking for uuid/uuid.h... yes
configure: looking for library without search path
checking for uuid_parse in -luuid... yes
checking for -msse4.2 support... no
configure: psm3 requires minimum of avx instruction set to build
checking for -mavx support... no
configure: psm3 requires minimum of avx instruction set to build
checking for -mavx2 support... no
checking for grep that handles long lines and -e... (cached) /usr/bin/grep
checking for -Wno-address-of-packed-member support... yes
checking for rdma/rv_user_ioctls.h... no
configure: psm3 provider: disabled
configure: *** Configuring sockets provider
checking for sys/socket.h... yes
checking for shm_open... (cached) yes
checking for getifaddrs... (cached) yes
configure: sockets provider: include in libfabric
configure: *** Configuring verbs provider
checking for infiniband/verbs.h... no
checking for rdma/rdma_cma.h... no
checking for rdma/rdma_cma.h... no
configure: verbs provider: disabled
configure: *** Configuring efa provider
checking for infiniband/verbs.h... no
checking for GCC... yes
checking for infiniband/efadv.h... no
configure: WARNING: The EFA provider requires rdma-core v31 or newer.
checking for infiniband/efadv.h... no
checking for struct efadv_device_attr.max_rdma_size... no
checking whether EFADV_DEVICE_ATTR_CAPS_RNR_RETRY is declared... no
checking for infiniband/efadv.h... no
configure: efa provider: disabled
configure: WARNING: efa provider was requested, but cannot be compiled
configure: error: Cannot continue

As shown, this generates several errors including one about the RDMA version. However, I had previously installed rdma-core from Github and /usr/lib/modules/5.15.0-1019-aws/kernel/drivers/infiniband/hw/efa/efa.ko is available. I also tried adding --enable-tcp --enable-verbs in the configuration command but didn't make any difference.
Thanks.

Hi,

To compile EFA provider, you need to install rdma-core, which does not appear to be installed on your machine.

If rdma-core is installed, you should have a file named /usr/include/infiniband/efadv.h

I recommend you download EFA installer, following this tutorial: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html

which will install rdma-core on your machine.

Hi @wzamazon,
As mentioned, I installed rdma-core from Github. It seems that it installed efa.ko in the right subdirectory (I think). However, /usr/include/infiniband wasn't created and efadv.h hasn't been copied or moved to this subdirectory (it's still in the build subdirectory). I guess that the problem is with the rdma-core installer (build.sh).

I had forgotten that there is an installer so maybe there is no need to install component by component after all.

As mentioned, I installed rdma-core from Github. It seems that it installed efa.ko in the right subdirectory (I think).

I think there is some confusion here. efa.ko will be not be part of rdma-core, If you build rdma-core correctly, you will have the file libefa.so.

I had forgotten that there is an installer so maybe there is no need to install component by component after all.

Yes. That is recommended approach.

From rdma-core:

The build is configured to run all the programs 'in-place' and cannot be installed.

You cannot install the rdma-core headers, libraries, and tools with the build script.

Hi @wzamazon,
I used the installer, but the thing is that the command $ fi_info -p efa -t FI_EP_RDM returned fi_getinfo: -61 instead of info about the fabrics so I'm not sure if everything went OK.

Hi @tschuett,
I saw the sentence that you mention in the README but couldn't understand what it meant at the time. I guess that before installing rdma-core, it'd be necessary to install RDMA as in other systems.

I read that as rdma-core is for testing and building rpm and Debian packages. You should/cannot use rdma-core to install headers on your system. You will need distribution packages.