ofiwg / libfabric

Open Fabric Interfaces

Home Page:http://libfabric.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mismatch credit handling

nmorey opened this issue · comments

While compiling libfabric for i686 systems, I noticed there is a lot of warnings.
I'm working on fixing them (most are trivial issues).
But one is slightly trickier and I'm not familiar enough with ofi to know the besst way around

make CFLAGS="-m32  -Werror" V=1 
make  all-am
make[1]: Entering directory '/work1/nmorey/workspace/alternates/master/libfabric'
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.  -I./include -D_GNU_SOURCE -D__USE_XOPEN2K8 -DSYSCONFDIR=\"/usr/local/etc\" -DRDMADIR=\"@rdmadir@\" -DPROVDLDIR=\"/usr/local/lib/libfabric\" -I./prov/sockets/include -I./prov/sockets        -I./prov/hook/include -I./prov/hook/perf/include -I./prov/hook/hook_debug/include -I./prov/hook/hook_hmem/include -I./prov/hook/dmabuf_peer_mem/include  -Wall -m32  -Werror -MT prov/hook/src/src_libfabric_la-hook_domain.lo -MD -MP -MF prov/hook/src/.deps/src_libfabric_la-hook_domain.Tpo -c -o prov/hook/src/src_libfabric_la-hook_domain.lo `test -f 'prov/hook/src/hook_domain.c' || echo './'`prov/hook/src/hook_domain.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I./include -D_GNU_SOURCE -D__USE_XOPEN2K8 -DSYSCONFDIR=\"/usr/local/etc\" -DRDMADIR=\"@rdmadir@\" -DPROVDLDIR=\"/usr/local/lib/libfabric\" -I./prov/sockets/include -I./prov/sockets -I./prov/hook/include -I./prov/hook/perf/include -I./prov/hook/hook_debug/include -I./prov/hook/hook_hmem/include -I./prov/hook/dmabuf_peer_mem/include -Wall -m32 -Werror -MT prov/hook/src/src_libfabric_la-hook_domain.lo -MD -MP -MF prov/hook/src/.deps/src_libfabric_la-hook_domain.Tpo -c prov/hook/src/hook_domain.c  -fPIC -DPIC -o prov/hook/src/.libs/src_libfabric_la-hook_domain.o
prov/hook/src/hook_domain.c: In function ‘hook_set_send_handler’:
prov/hook/src/hook_domain.c:124:12: error: passing argument 2 of ‘domain->base_ops_flow_ctrl->set_send_handler’ from incompatible pointer type [-Werror=incompatible-pointer-types]
            hook_credit_handler);
            ^~~~~~~~~~~~~~~~~~~
prov/hook/src/hook_domain.c:124:12: note: expected ‘ssize_t (*)(struct fid_ep *, uint64_t) {aka int (*)(struct fid_ep *, long long unsigned int)}’ but argument is of type ‘ssize_t (*)(struct fid_ep *, size_t) {aka int (*)(struct fid_ep *, unsigned int)}’
prov/hook/src/hook_domain.c: At top level:
prov/hook/src/hook_domain.c:150:17: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
  .add_credits = hook_add_credits,
                 ^~~~~~~~~~~~~~~~
prov/hook/src/hook_domain.c:150:17: note: (near initialization for ‘hook_ops_flow_ctrl.add_credits’)
prov/hook/src/hook_domain.c:152:22: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
  .set_send_handler = hook_set_send_handler,
                      ^~~~~~~~~~~~~~~~~~~~~
prov/hook/src/hook_domain.c:152:22: note: (near initialization for ‘hook_ops_flow_ctrl.set_send_handler’)

The issue is that set_send_handler in ofi_util.h expect the handler to use a uint64_t for credits, while the base_credit_handler from ofi_hook.h should use a size_t.
This works well on x86_64 systems, but may cause a mess on i686

Should the base_credit_handler be switched to uint64_t as well ?

The calls in hook_domain.c are incorrect. The correct function prototypes are in ofi_util.h.

struct ofi_ops_flow_ctrl {
	size_t	size;
	bool	(*available)(struct fid_ep *ep);
	int	(*enable)(struct fid_ep *ep, uint64_t threshold);
	void	(*add_credits)(struct fid_ep *ep, uint64_t credits);
	void	(*set_send_handler)(struct fid_domain *domain,
			ssize_t (*send_handler)(struct fid_ep *ep, uint64_t credits));
};

These two calls:

static void hook_add_credits(struct fid_ep *ep_fid, size_t credits)
static void hook_set_send_handler(struct fid_domain *domain_fid,
		ssize_t (*credit_handler)(struct fid_ep *ep, size_t credits))

should use uint64_t for credits, not size_t. The credit values are expected to be part of some wire protocol message, so we use a fixed size (u64) rather than a variable sized value. struct ofi_ops_flow_ctrl defines an API contract between 2 providers, so changing it would require updating OFI_OPS_FLOW_CTRL and deciding whether we wanted to support both versions of the structure (we wouldn't) or just the new one.

Thanks @shefty
I'll fix that. ofi_hook also needs a fix in that case:

+++ b/include/ofi_hook.h
@@ -163,7 +163,7 @@ struct hook_domain {
    struct fid_domain *hdomain;
    struct hook_fabric *fabric;
    struct ofi_ops_flow_ctrl *base_ops_flow_ctrl;
-   ssize_t (*base_credit_handler)(struct fid_ep *ep_fid, size_t credits);
+   ssize_t (*base_credit_handler)(struct fid_ep *ep_fid, uint64_t credits);
 };

There's a bunch of tiny issues using m32 that I'll fix as well.

The ones I'm not sure about are these:

--- a/prov/mrail/src/mrail_domain.c
+++ b/prov/mrail/src/mrail_domain.c
@@ -68,7 +68,7 @@ static int mrail_domain_map_raw(struct mrail_domain *mrail_domain,
 
    memcpy(mr_map, map->raw_key, map->key_size);
 
-   *(map->key) = (uint64_t)mr_map;
+   *(map->key) = (unsigned long)mr_map;
 
    return 0;
 }

and

@@ -365,7 +365,7 @@ static int rxd_av_close(struct fid *fid)
        return ret;
 
    while ((node = ofi_rbmap_get_root(&av->rbmap))) {
-       rxd_addr = (fi_addr_t) node->data;
+       rxd_addr = (fi_addr_t)(unsigned long) node->data;
        dg_addr = (intptr_t)ofi_idx_lookup(&av->rxdaddr_dg_idx,
                           (int) rxd_addr);

Do you care about these?
As everything is unsigned, there are no real risks IMHO. It's just cleaner to not get dumped a whole lot of messages during compilation.
Macros might be a slightly cleaner fix. Something simple like
OFI_UINT64_TO_PTR(x) (void*)(unsigned long)(x)
Would you prefer that ? What would be the best spot to define such a macro ?

uintptr_t is the proper type to use for integer-pointer conversion.

Hi all,

Has this issue been solved and is the fix in the main?
I think I get a similar problem when building on M2 (both 1.17.1 and 1.18.0):

prov/hook/src/hook_domain.c:124:12: error: incompatible function pointer types passing 'ssize_t (struct fid_ep *, size_t)' (aka 'long (struct fid_ep *, unsigned long)') to parameter of type 'ssize_t (*)(struct fid_ep *, uint64_t)' (aka 'long (*)(struct fid_ep *, unsigned long long)') [-Wincompatible-function-pointer-types]
                                                     hook_credit_handler);
                                                     ^~~~~~~~~~~~~~~~~~~
prov/hook/src/hook_domain.c:150:17: error: incompatible function pointer types initializing 'void (*)(struct fid_ep *, uint64_t)' (aka 'void (*)(struct fid_ep *, unsigned long long)') with an expression of type 'void (struct fid_ep *, size_t)' (aka 'void (struct fid_ep *, unsigned long)') [-Wincompatible-function-pointer-types]
        .add_credits = hook_add_credits,
                       ^~~~~~~~~~~~~~~~
prov/hook/src/hook_domain.c:152:22: error: incompatible function pointer types initializing 'void (*)(struct fid_domain *, ssize_t (*)(struct fid_ep *, uint64_t))' (aka 'void (*)(struct fid_domain *, long (*)(struct fid_ep *, unsigned long long))') with an expression of type 'void (struct fid_domain *, ssize_t (*)(struct fid_ep *, size_t))' (aka 'void (struct fid_domain *, long (*)(struct fid_ep *, unsigned long))') [-Wincompatible-function-pointer-types]
        .set_send_handler = hook_set_send_handler,
                            ^~~~~~~~~~~~~~~~~~~~~
3 errors generated.

(I am using clang 16.0.1)

Thanks for your help :-)