ofiwg / libfabric

Open Fabric Interfaces

Home Page:http://libfabric.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

prov/psm3: psm3_rbtree.c missing from tarball

opoplawski opened this issue · comments

Describe the bug

+ /usr/bin/make -O -j4 V=1 VERBOSE=1
make: *** No rule to make target 'prov/psm3/psm3/include/psm3_rbtree.c', needed by 'prov/psm3/src/psm3_src_chksum.h'.  Stop.

Environment:
Fedora rawhide

Additional context
That file appears to be missing from the 1.15.0 tarball.

The following files are missing from the tarball:

libfabric-1.15.0/prov/psm3/psm3/hal_sockets: sockets_spio.c
libfabric-1.15.0/prov/psm3/psm3/hal_verbs: verbs_spio.c
libfabric-1.15.0/prov/psm3/psm3/include: psm3_rbtree.c
libfabric-1.15.0/prov/psm3/psm3/utils: utils_dwordcpy-x86_64-fast.S
libfabric-1.15.0/prov/psm3: VERSION

Strangely enough, when I ran the following command to create the tarball from the v1.15.x branch, those files are included.

$ ./autogen.sh
$ ./configure
......
*** Built-in providers: opx dmabuf_peer_mem hook_hmem hook_debug perf rstream shm rxd mrail rxm tcp udp usnic verbs sockets psm3
*** DSO providers:
***
$ make dist

@shefty What was done differently when the release tarball was created?

I think I have found the reason. These files are in the EXTRA_DIST list. Other source files are in the libpsm3_la_SOURCES list. Both are guarded by the HAVE_PSM3 and HAVE_PSM3_SRC conditions. However, when running make dist the conditions are ignored for libpsm3_la_SOURCES but not for EXTRA_DIST. As the result, if the psm3 provider is not enabled these files will not be included in the tarball.

The tarfile should have been created using make distcheck, otherwise that's the same.

make distcheck would call make dist first and then do check. So they are equivalent in terms of tarball creation. If you check the config.log (if still there) I bet psm3 provider was not enabled.

Yeah, my config.log has been overwritten. So, the work-around for now might be to use configure --disable-psm3. And it sounds like I can regenerate a new tarball from the same git tag as an updated release, but I need to determine what's the best option here.

No, you want to use --enable-psm3 option. That way you will know why psm3 failed the check. With disable-psm3 you will get the same tarball as the released one.

I meant a work-around for someone trying to use the v1.15.0 release tarball.

I just checked my system building the v1.15.0 tarfile:

configure: *** Configuring psm3 provider
checking sys/mman.h usability... yes
checking sys/mman.h presence... yes
checking for sys/mman.h... yes
configure: looking for library without search path
checking for shm_open in -lrt... yes
checking numa.h usability... yes
checking numa.h presence... yes
checking for numa.h... yes
configure: looking for library without search path
checking for numa_node_of_cpu in -lnuma... yes
checking infiniband/verbs.h usability... no
checking infiniband/verbs.h presence... no
checking for infiniband/verbs.h... no
checking uuid/uuid.h usability... yes
checking uuid/uuid.h presence... yes
checking for uuid/uuid.h... yes
configure: looking for library without search path
checking for uuid_parse in -luuid... yes
checking for -msse4.2 support... yes
checking for -mavx support... yes
checking for -mavx2 support... yes
checking for grep that handles long lines and -e... (cached) /bin/grep
checking for -Wno-address-of-packed-member support... no
checking rdma/rv_user_ioctls.h usability... no
checking rdma/rv_user_ioctls.h presence... no
checking for rdma/rv_user_ioctls.h... no
configure: psm3 provider: disabled

I don't know what caused psm3 to disable itself, but that explains why the problem wasn't caught.

I think what I want to do is update the psm3 makefile, commit those changes, and generate a v1.15.1 release, so that we don't have 2 different tarballs posing as the v1.15.0 release.

This is probably the reason:

checking infiniband/verbs.h usability... no
checking infiniband/verbs.h presence... no
checking for infiniband/verbs.h... no

I don't know what caused psm3 to disable itself, but that explains why the problem wasn't caught.

I think what I want to do is update the psm3 makefile, commit those changes, and generate a v1.15.1 release, so that we don't have 2 different tarballs posing as the v1.15.0 release.

Sorry, wrong button. Yes, psm3 needs verbs/rdma-core to build.

When I run make distcheck I am not seeing any issues with psm3. Is the issue that the EXTRA_DIST is within an if block?

@acgoldma - why? I thought it was supposed to work over standard sockets.

If you configure with --disable-psm3, followed by make dist, that might show the issue. But only if you try to build after creating the tarfile, this time with configure --enable-psm3.

@acgoldma The makefile generated with autotools would always include files in the xxx_la_SOURCES list in the dist tarball, even if it is in a if block. However, files in EXTRA_DIST are not.

I think the ultimate fix is to move the EXTRA_DIST definition out from the if block.

@acgoldma - why? I thought it was supposed to work over standard sockets.

psm3 supports multiple HALs: Verbs (UD and RC QPs) as well as Sockets (TCP/UDP).

If you configure with --disable-psm3, followed by make dist, that might show the issue. But only if you try to build after creating the tarfile, this time with configure --enable-psm3.

Odd, the Makefile in the tarball show the files in the expected locations (in chksum_srcs and EXTRA_DIST).

@acgoldma The makefile generated with autotools would always include files in the xxx_la_SOURCES list in the dist tarball, even if it is in a if block. However, files in EXTRA_DIST are not.

I think the ultimate fix is to move the EXTRA_DIST definition out from the if block.

Thanks, I will try this now.

Honestly, I don't know that a provider should be messing with the global EXTRA_DIST at all.

Looking at the Makefile.include:

_psm3_extra_dist = \
	prov/psm3/psm3/include/psm3_rbtree.c \
	prov/psm3/psm3/hal_verbs/verbs_spio.c \
	prov/psm3/psm3/hal_sockets/sockets_spio.c \
	prov/psm3/psm3/utils/utils_dwordcpy-x86_64-fast.S \
	prov/psm3/VERSION

First, why are 3 .c files extra dist and not part of src? I don't understand that at all. The .S is some assembly code. Again, why is that not part of the src files?

VERSION is part of the psmX legacy of weirdness. For that we should have a new provider variable that can be updated, similar to prov_dist_man_pages. E.g. prov_extra_dist, so that providers aren't modifying EXTRA_DIST directly.

Those .c files are used as header files which is not a recommended practice. In short term, they can be renamed to use different suffix and added to the src file list. The .S file seems to be unused.

Those .c files are used as header files which is not a recommended practice. In short term, they can be renamed to use different suffix and added to the src file list. The .S file seems to be unused.

I would like to solve the main issue here. Can you move this discussion of the extra files to new issue/discussion.

I have a patch that should work for this testing now.

Those files are part of the main issue here. We need to fix the underlying problem. They shouldn't be extra dist files, and an unused assembly file shouldn't be there at all. That leaves the VERSION file as the only baggage. Why is that file needed?

Including a C file is a valid and useful way to optimize for performance critical subsystems.
Using extra_dist is the autotools 'recommended/only' way of doing this.

The VERSION file is used to generate the unique provider version. We theoretically could just pull this into the prov/psm3/configure.* files.

The asm '.S file is an optimized copy function that can be enabled through defines. We saw some unique situations were this optimization had a noticeable performance improvement and kept it around, through disabled by default.