ofiwg / libfabric

Open Fabric Interfaces

Home Page:http://libfabric.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Libfabric over NAT not working

ziegenbalg opened this issue · comments

This is a continuation of a bug I filed for the DAOS filesystem. It appears that message passing does not work over NAT. Is this a known issue?

To Reproduce
Try to pass a message where both endpoints are behind a NAT.

Expected behavior
Connection works where client/server IP's are swapped for the NAT'ed address. Not the IP from behind the NAT.

Output
See wireshark output. Notice how eventually the server tries to reach out to the clients IP address from behind the NAT (I.e. 172.31.44.136, and not it's NAT'ed address of 18.219.144.132).
screen

Environment:
Linux

Additional context
Libfabric across the internet without vpn. I.e. not a local-net setup.

Somewhere along the lines of a client connecting to the fabric in order to send/receive data to a server, the server records the clients IP address. The IP address of the client has to be sent in the context somewhere as the IP packet has the correct NAT'ed address. I don't know how else the server would learn of the internal IP address of the client.

Any info on where this may happen would be greatly appreciated.

FYI:
I'm using the TCP provider in /prov/tcp/.

This is a make shift output from my own debugging statements. I'm unsure on how to turn on the debug flag for this library since it's called by the daos instance.

Starting ofi_fabric_init
Starting tcpx_create_fabric
Starting ofi_fabric_init
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting tcpx_cq_open
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting tcpx_cq_open
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting tcpx_cq_open
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting tcpx_cq_open
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting tcpx_cq_open
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting tcpx_cq_open
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
starting tcpx_cq_open
starting ofi_cq_init
starting fi_cq_init
inside ofi_cq_init
Starting process_cm_ctx:
	-> TCPX_CM_LISTENING
Starting tcpx_accept
Starting process_cm_ctx:
	-> TCPX_CM_CONNECTING
Starting tcpx_cm_send_req
Starting tx_cm_data
Starting process_cm_ctx:
	-> TCPX_CM_WAIT_REQ
Starting tcpx_cm_recv_req
Starting rx_cm_data
Starting process_cm_ctx:
	-> TCPX_CM_RESP_READY
Starting tcpx_cm_send_resp
Starting tx_cm_data
Starting process_cm_ctx:
	-> TCPX_CM_REQ_SENT
Starting tcpx_cm_recv_resp
Starting rx_cm_data
Starting tcpx_sendv: 0 0.0.0.0
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_tsend
Starting tcpx_alloc_tsend
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_sendv: 0 0.0.0.0
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_tsend
Starting tcpx_alloc_tsend
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_sendv: 0 0.0.0.0
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_tsend
Starting tcpx_alloc_tsend
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_sendv: 0 0.0.0.0
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting tcpx_tsend
Starting tcpx_alloc_tsend
Starting tcpx_process_tx
Starting tcpx_send_msg
Starting process_cm_ctx:
	-> TCPX_CM_LISTENING
Starting tcpx_accept
Starting process_cm_ctx:
	-> TCPX_CM_WAIT_REQ
Starting tcpx_cm_recv_req
Starting rx_cm_data
Starting process_cm_ctx:
	-> TCPX_CM_RESP_READY
Starting tcpx_cm_send_resp
Starting tx_cm_data

Here are some fi_info outputs:

One from the server:

    src_addr: fi_sockaddr_in://10.35.0.110:0
    src_addr: fi_sockaddr_in6://[fe80::b62e:99ff:fef8:9022]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://10.35.0.110:0
    src_addr: fi_sockaddr_in6://[fe80::b62e:99ff:fef8:9022]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://10.35.0.110:0
    src_addr: fi_sockaddr_in6://[fe80::b62e:99ff:fef8:9022]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://10.35.0.110:0
    src_addr: fi_sockaddr_in6://[fe80::b62e:99ff:fef8:9022]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://10.35.0.110:0
    src_addr: fi_sockaddr_in6://[fe80::b62e:99ff:fef8:9022]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0

and from the client:

-bash-5.2# fi_info -p tcp -a FI_SOCKADDR_IN -vvv | grep sock
    src_addr: fi_sockaddr_in://172.31.44.136:0
    src_addr: fi_sockaddr_in6://[fe80::854:6fff:fe72:a7e1]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://172.31.44.136:0
    src_addr: fi_sockaddr_in6://[fe80::854:6fff:fe72:a7e1]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://172.31.44.136:0
    src_addr: fi_sockaddr_in6://[fe80::854:6fff:fe72:a7e1]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://172.31.44.136:0
    src_addr: fi_sockaddr_in6://[fe80::854:6fff:fe72:a7e1]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
    src_addr: fi_sockaddr_in://172.31.44.136:0
    src_addr: fi_sockaddr_in6://[fe80::854:6fff:fe72:a7e1]:0
    src_addr: fi_sockaddr_in://127.0.0.1:0
    src_addr: fi_sockaddr_in6://[::1]:0
-bash-5.2# 

One can clearly see why the fabric is trying to send it to 172.31.44.136. My question is, is there a way to change the 172.31.44.136 IP address on the fly, ideally on the server side, to the public IP address.

NOTE: There will only be one client per public nat'ed address.

It seems you are only capturing on one side of the traffic. It would be more helpful to have both sides. Anyhow, it looks like two sides were not in sync and 18.219.144.132 closed the connection while 192.168.50.10 continued to send packets.

  • Check your setup for compatible version of software.
  • Make sure you are using the same provider on DAOS clients and servers
  • Check your NAT setup, make sure the external connection is capable of supporting all the connections behind the NAT. TCP retransmission is a bit concerning not knowing your configuration.

Packet Capture:
packets.zip

@chien-intel, thank you for your reply. Here is some more information about the topology.

(Danos Client)                                             (NAT: Open Port 9998, 31416)
AWS EC2 Instance                  AWS Public IP       Home Router (DMZ -> Vyatta Router)          DANOS Server (libfabric)
        |                             |                        |                     |                      |
        v                             v                        v                     v                      v
172.31.44.136 <------------> 18.219.144.132 <--------------> 73.93.84.167 <--> 192.168.50.10 <------> 10.35.0.110

The default port for danos is techincally 10001, but it was changed to 9998. The libfabric port is 31416. Upon successful registration with the fabric, the danos server seem to try to negotiate a higher range port, which is the part that fails. This can be seen from the server.pcap trying to reach out to the 172.31.44.136 address. This is the issue I'm trying to solve.

From the libfabric documentation:

A fabric represents a collection of hardware and software resources that access a single physical or virtual network. For example, a fabric may be a single network subnet or cluster.

It may be that libfabric by design won't work with a NAT? However, I'm trying to achieve a single use case where there only will be one client/server behind each NAT. No libfabric endpoint vector/port translation in the NAT needs to be done.

Long story short, where 10.35.0.110 tries to send a TCP SYN to 172..31.44.136, I would like that IP to be 18.219.144.132.

I noticed that there are various libfabric config options for adjusting the port. Maybe there is something that I can change here?

When I print the address from the socket in: tcpx_cm_recv_req, I get the correct 18.219.144.132 IP address. Somewhere along the way to tcpx_cm_send_resp, the IP address gets switched to 172.31.44.136.

Starting process_cm_ctx:
	-> TCPX_CM_LISTENING
Starting tcpx_accept
Starting process_cm_ctx:
	-> TCPX_CM_WAIT_REQ
Starting tcpx_cm_recv_req
Starting rx_cm_data
+++++++++++++++++++++++
IPv4 address: 18.219.144.132
+++++++++++++++++++++++
Starting process_cm_ctx:
	-> TCPX_CM_RESP_READY
Starting tcpx_cm_send_resp
Unknown address family: 8265
Starting tx_cm_data
<Hangs here as it tries to SYN connect to 172.31.44.136>

Long story short, where 10.35.0.110 tries to send a TCP SYN to 172..31.44.136, I would like that IP to be 18.219.144.132.

Would that solve your problem? do you have any DAOS code running on 18.219.144.132 or do you have any network config that would route traffic from the internet to 172.31.44.136 where DAOS client is running? You have opened up ports on the DAOS server side, you need to do the same on the DAOS client side. In server.pcap, frame 19-23, 10.35.0.110 tried to initiate a connection to 172.31.44.36 (this is DAOS server connecting directly to DAOS client) and it couldn't get through.