ofiwg / libfabric

Open Fabric Interfaces

Home Page:http://libfabric.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

prov/ofi_rxm Not working, Need core provider, skipping ofi_rxm

jordialcaraz opened this issue · comments

Describe the bug
I'm trying to execute an application with FI_PROVIDER=ofi_rxm, but it fails to find ofi_rxm. Tried the same with fi_pingpong to check if the problem was the application or not, but fi_pingpong also fails.
Then I tried fi_getinfo, which returns -61 (No data available) when FI_PROVIDER=ofi_rxm, but it appears with fi_info.

To Reproduce
Steps to reproduce the behavior:
FI_PROVIDER=ofi_rxm FI_LOG_LEVEL=Debug fi_info

Expected behavior
Should print the same output as fi_info.

Output
$ FI_PROVIDER=ofi_rxm FI_LOG_LEVEL=Debug
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable perf_cntr=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable hook=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable hmem=
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable mr_cache_max_size=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable mr_cache_max_count=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable mr_cache_monitor=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable mr_cuda_cache_monitor_enabled=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable mr_rocr_cache_monitor_enabled=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable mr_ze_cache_monitor_enabled=
libfabric:4082300:1708029534::core:mr:ofi_default_cache_size():79 default cache size=526983472
libfabric:4082300:1708029534::core:core:fi_param_get_():382 read string var provider=ofi_rxm
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable universe_size=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable av_remove_cleanup=
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable offload_coll_provider=
libfabric:4082300:1708029534::core:core:fi_param_get_():382 read string var provider_path=/storage/usersb/jalcaraz/spack/opt/spack/linux-rhel8-zen/gcc-8.5.0/libfabric-1.20.1-oqyfclnlyosaabt3jzrbkrrzj6q4cirf/lib/
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable enable_passthru=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable buffer_size=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable tx_size=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable rx_size=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable msg_tx_size=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable msg_rx_size=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable cm_progress_interval=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable cq_eq_fairness=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable data_auto_progress=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable use_rndv_write=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable def_wait_obj=
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable def_tcp_wait_obj=
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: ofi_rxm (120.10)
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: verbs (120.10)
libfabric:4082300:1708029534::core:core:ofi_register_provider():533 "verbs" filtered by provider include/exclude list, skipping
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: ofi_hook_perf (120.10)
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: ofi_hook_trace (120.10)
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: ofi_hook_debug (120.10)
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable hmem=
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4082300:1708029534::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4082300:1708029534::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: ofi_hook_hmem (120.10)
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: ofi_hook_dmabuf_peer_mem (120.10)
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: ofi_hook_noop (120.10)
libfabric:4082300:1708029534::core:core:ofi_register_provider():506 registering provider: off_coll (120.10)
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4082300:1708029534:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping ofi_rxm
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4082300:1708029534:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping ofi_rxm
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4082300:1708029534:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping ofi_rxm
libfabric:4082300:1708029534::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4082300:1708029534:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping ofi_rxm
libfabric:4082300:1708029534::core:core:fi_getinfo_():1304 fi_getinfo: provider ofi_rxm returned -61 (No data available)

$ fi_info
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0-xrc
version: 120.10
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0-xrc
version: 120.10
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_XRC
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0-dgram
version: 120.10
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs
fabric: IB-0xfe80000000000000
domain: mlx5_0-dgram
version: 120.10
type: FI_EP_DGRAM
protocol: FI_PROTO_IB_UD
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0-xrc
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0-xrc
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 120.10
type: FI_EP_RDM
protocol: FI_PROTO_RXM

Environment:
Red Hat Enterprise Linux 8.8 (Ootpa)

This is not a bug. You need to use verbs;ofi_rxm as shown from your fi_info output.

Thanks Chien.

I had also tried with verbs;ofi_rxm, but although fi_info works, fi_pingpong fails (it looks for ofi_rxm at the end, instead of verbs;ofi_rxm):

$ FI_PROVIDER="verbs;ofi_rxm" FI_LOG_LEVEL=Debug fi_pingpong
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable perf_cntr=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hook=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem=
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cache_max_size=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cache_max_count=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cache_monitor=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_cuda_cache_monitor_enabled=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_rocr_cache_monitor_enabled=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable mr_ze_cache_monitor_enabled=
libfabric:4113849:1708031595::core:mr:ofi_default_cache_size():79 default cache size=526983472
libfabric:4113849:1708031595::core:core:fi_param_get_():382 read string var provider=verbs;ofi_rxm
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable universe_size=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable av_remove_cleanup=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable offload_coll_provider=
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable provider_path=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable enable_passthru=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable buffer_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable tx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable rx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable msg_tx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable msg_rx_size=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable cm_progress_interval=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable cq_eq_fairness=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable data_auto_progress=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable use_rndv_write=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable def_wait_obj=
libfabric:4113849:1708031595::ofi_rxm:core:fi_param_get_():373 variable def_tcp_wait_obj=
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_rxm (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: verbs (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_perf (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_trace (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_debug (120.10)
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem=
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4113849:1708031595::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4113849:1708031595::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_hmem (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_dmabuf_peer_mem (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: ofi_hook_noop (120.10)
libfabric:4113849:1708031595::core:core:ofi_register_provider():506 registering provider: off_coll (120.10)
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable tx_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable rx_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable tx_iov_limit=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable rx_iov_limit=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable inline_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable min_rnr_timer=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable use_odp=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable prefer_xrc=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable xrcd_filename=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable cqread_bunch_size=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable gid_idx=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable device_name=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable use_dmabuf=
libfabric:4113849:1708031595::verbs:core:vrb_read_params():720 dmabuf support is enabled
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable iface=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable dgram_use_name_server=
libfabric:4113849:1708031595::verbs:core:fi_param_get_():373 variable dgram_name_server_port=
libfabric:4113849:1708031595::verbs:fabric:verbs_devs_print():889 list of verbs devices found for FI_EP_MSG:
libfabric:4113849:1708031596::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4113849:1708031596::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4113849:1708031596::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4113849:1708031596::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4113849:1708031596::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Supported: FI_EP_RDM
libfabric:4113849:1708031596::ofi_rxm:core:ofi_check_ep_type():691 Requested: FI_EP_DGRAM
libfabric:4113849:1708031596::core:core:fi_getinfo_():1304 fi_getinfo: provider ofi_rxm returned -61 (No data available)
fi_getinfo(): util/pingpong.c:1489, ret=-61 (No data available)

Thank you.

by default, fi_pingpong uses FI_EP_DGRAM. try fi_pingpong -e rdm

With fi_pingpong -e rdm and also -e rdm -p verbs, the output is:

libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable perf_cntr=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hook=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem=
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cache_max_size=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cache_max_count=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cache_monitor=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_cuda_cache_monitor_enabled=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_rocr_cache_monitor_enabled=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable mr_ze_cache_monitor_enabled=
libfabric:4117195:1708032872::core:mr:ofi_default_cache_size():79 default cache size=526983472
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable provider=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable universe_size=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable av_remove_cleanup=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable offload_coll_provider=
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable provider_path=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable enable_passthru=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable buffer_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable tx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable rx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable msg_tx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable msg_rx_size=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable cm_progress_interval=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable cq_eq_fairness=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable data_auto_progress=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable use_rndv_write=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable def_wait_obj=
libfabric:4117195:1708032872::ofi_rxm:core:fi_param_get_():373 variable def_tcp_wait_obj=
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_rxm (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: verbs (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_perf (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_trace (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_debug (120.10)
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem=
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_CUDA not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ROCR not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_ZE not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_NEURON not supported
libfabric:4117195:1708032872::core:core:ofi_hmem_init():607 Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:4117195:1708032872::core:core:fi_param_get_():373 variable hmem_disable_p2p=
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_hmem (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_dmabuf_peer_mem (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: ofi_hook_noop (120.10)
libfabric:4117195:1708032872::core:core:ofi_register_provider():506 registering provider: off_coll (120.10)
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable tx_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable rx_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable tx_iov_limit=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable rx_iov_limit=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable inline_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable min_rnr_timer=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable use_odp=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable prefer_xrc=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable xrcd_filename=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable cqread_bunch_size=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable gid_idx=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable device_name=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable use_dmabuf=
libfabric:4117195:1708032872::verbs:core:vrb_read_params():720 dmabuf support is enabled
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable iface=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable dgram_use_name_server=
libfabric:4117195:1708032872::verbs:core:fi_param_get_():373 variable dgram_name_server_port=
libfabric:4117195:1708032872::verbs:fabric:verbs_devs_print():889 list of verbs devices found for FI_EP_MSG:
libfabric:4117195:1708032873::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4117195:1708032873::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4117195:1708032874::verbs:fabric:vrb_get_device_attrs():620 device mlx5_0: first found active port is 1
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_MSG
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874::verbs:core:ofi_check_ep_type():691 Requested: FI_EP_RDM
libfabric:4117195:1708032874::core:core:fi_getinfo_():1304 fi_getinfo: provider verbs returned -61 (No data available)
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider verbs, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider verbs, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_attr():775 Provider requires use of shared rx context
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1578 hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1578 hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874::verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0-dgram
libfabric:4117195:1708032874::core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874::core:core:fi_fabric_():1504 Opened fabric: IB-0xfe80000000000000
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider off_coll, skipping verbs
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider off_coll, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:ofi_check_fabric_attr():412 Requesting provider off_coll, skipping tcp
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping off_coll
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1194 Need core provider, skipping off_coll
libfabric:4117195:1708032874::core:core:fi_getinfo_():1304 fi_getinfo: provider ofi_rxm returned -61 (No data available)
libfabric:4117195:1708032874::core:core:fi_fabric_():1504 Opened fabric: UTIL-COLL
libfabric:4117195:1708032874::core:core:fi_fabric_():1504 Opened fabric: IB-0xfe80000000000000
libfabric:4117195:1708032874::ofi_rxm:core:fi_param_get_():373 variable use_srx=
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #1 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #2 mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1601 adding fi_info for domain: mlx5_0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #3 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:vrb_check_hints():268 skipping device mlx5_0-xrc (want mlx5_0)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #4 mlx5_0-xrc
libfabric:4117195:1708032874:ofi_rxm:verbs:core:vrb_check_hints():268 skipping device mlx5_0-xrc (want mlx5_0)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #5 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_matching_info():1556 checking domain: #6 mlx5_0-dgram
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():690 unsupported endpoint type
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Supported: FI_EP_DGRAM
libfabric:4117195:1708032874:ofi_rxm:verbs:core:ofi_check_ep_type():691 Requested: FI_EP_MSG
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_rai_id():301 rdma_resolve_addr: Invalid argument (22)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_rai_id():303 src addr: fi_sockaddr_ib://[fe80::b83f:d203:2b:b478]:0xffff:0x13f:0x0
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_rai_id():305 dst addr: (null)
libfabric:4117195:1708032874:ofi_rxm:verbs:fabric:vrb_get_match_infos():1825 handling of the socket address fails - -22
libfabric:4117195:1708032874:ofi_rxm:verbs:core:vrb_get_match_infos():1845 Handling of the addresses fails, the getting infos is unsuccessful
libfabric:4117195:1708032874:ofi_rxm:core:core:fi_getinfo_():1304 fi_getinfo: provider verbs returned -61 (No data available)
libfabric:4117195:1708032874:ofi_rxm:core:core:ofi_layering_ok():1183 Provider ofi_rxm is excluded
fi_domain(): util/pingpong.c:1415, ret=-61 (No data available)

verbs supports msg endpoints (you would need -e msg argument)
verbs;ofi_rxm supports rdm endpoints (you would need -e rdm argument)

You can run fi_info -v -p verbs to view the full set of supported capabilities and endpoint types

From your fi_info and log, I'm guessing you do not have IPoIB set up. fi_pingpong requires either IPv4 or IPv6 address. After you have that configured, use verbs;ofi_rxm with -e rdm, that should work for you.