Establishing a session with nodes behind NAT/Firewall

Question

Establishing a session with nodes behind NAT/Firewall

acolytec3 opened this issue 2 years ago · comments

In my work on Ultralight, I've observed that our nodes running behind firewall or NAT (e.g. either on an Android device or else running on my local laptop) often times are not able to establish connections with our public Ultralight bootnodes with sessions getting dropped during the handshake process. As best I can tell, the root cause is often that the node behind the firewall is reporting a local IP address in its ENR so the other node is dropping the session during the handshake process at this point due to this verifyEnr call.

I've interacted with @fjl on this some and it seems that this may be against the discv5 spec so wanted to understand if I'm misconfiguring something in Ultralight when instantiating the discv5 client or else what the thought process behind this is as it is something we're having to hack around in order to allow Ultralight nodes to join a public network. At a minimum, would you be open to a PR that allows for something like a session with an "untrusted" status? This would allow Ultralight nodes who don't know their public UDP endpoint address to connect to bootnodes, exchange discv5 PING/PONG, and then be able to addrVotes to update their ENR to the correct public UDP endpoint reported by the bootnode.

Felix Lange · Answer 1 · Sat Apr 30 2022 03:05:11 GMT+0800 (China Standard Time)

I think the verifyEnr check should be removed.

acolytec3 · Answer 2 · Wed May 04 2022 21:58:13 GMT+0800 (China Standard Time)

As a follow on, my fork now removes this check and things seem to work okay. It's not perfect yet and I've not dealt with the probably separate issue of whether or not an ENR with an inconsistent UDP endpoint should be added to the routing table or not (seems like it shouldn't if we follow what other implementations have done -- i.e. establish sessions only and then only respond to inbound requests on the session and not send outbound requests). Then, if/when the remote node updates its ENR to reflect the observed UDP endpoint, a new session can be established and the node added to the routing table.

Lion - dapplion · Answer 3 · Thu May 05 2022 00:29:51 GMT+0800 (China Standard Time)

@acolytec3 @fjl would you mind checking what other discv5 implementations do regarding this ENR verification (i.e. Go, Rust, Nim)?

acolytec3 · Answer 4 · Thu May 05 2022 00:38:31 GMT+0800 (China Standard Time)

I've been interacting with the Nim implementers on it and my understanding is that nim-discv5 approaches as I outlined above (will establish sessions with nodes with inconsistent ENRs but not add to the routing table and then will only respond to inbound requests from that node but not initiate outbound requests) though I believe Rust is a little stricter. I'll check with the Trin team and find out.

@fjl can you comment on the Go implementation?

Felix Lange · Answer 5 · Thu May 05 2022 03:07:15 GMT+0800 (China Standard Time)

Yes, the go-ethereum implementation of discv5 does permit nodes with incorrect IP/port listed in ENR to establish a session. In go-ethereum such nodes may even be added to the routing table. However, since table entries must be reachable at the address listed in ENR for liveness verification, the nodes will be removed soon after.

acolytec3 · Answer 6 · Fri May 06 2022 02:56:37 GMT+0800 (China Standard Time)

It looks like rust-discv5 will only establish sessions with nodes that have either an ENR that matches the observed IP or an ENR with no location keys set, so it's somewhere in the middle of what I proposed. I've made a couple of additional changes in my fork and verified that it works well in local testing for ultralight nodes. The basic approach I'm using (which follows nim-discv5 is:
When an inbound session is being negotiated, if the UDP endpoint in the ENR provided by the remote node doesn't match the observed IP, a session will be established but the established event is not emitted which ensures that this node does not get added to the discv5 routing table. As such, the client still can respond to inbound requests but does not propagate an ENR with invalid location details to the network and if/when the remote node updates its external IP, a new session can be negotiated based on the updated contact info.

I think it might be worth further discussion about what is allowed in the sendRPCRequest field (currently either ENR or multiaddr). Within my portalnetwork code, in order to properly respond to an inbound request from a node with no entry in the routing table, I'm having to create a bogus ENR and then override the private _nodeId field in order to pass a valid input to the sendTalkResp method (which passes on to the sendRPCRequest). In the circumstances where we're responding to a node with either no ENR location fields or something invalid, we only have the nodeId propagated from the TalkReq message header and don't have the pubkey so I can't construct a multiaddr with the correct node ID encoding. It might be nice to add a third option that corresponds to the session key (observed IP address + nodeId) for this scenario, (provided you want to move forward with a PR along these lines).

Lion - dapplion · Answer 7 · Fri May 06 2022 18:36:33 GMT+0800 (China Standard Time)

@acolytec3 I'm fine with an approach that allows Portal Network usecases as long as it's safe enough against DOS attacks, i.e. the happy medium. If you could upstream your fork's code that would be great! We have limited bandwidth right now

acolytec3 · Answer 8 · Fri May 06 2022 20:39:04 GMT+0800 (China Standard Time)

@acolytec3 I'm fine with an approach that allows Portal Network usecases as long as it's safe enough against DOS attacks, i.e. the happy medium. If you could upstream your fork's code that would be great! We have limited bandwidth right now

Okay! I'm going to make a few more tweaks and then we'll test some with Ultralight nodes on the fork to see if we uncover any issues before I submit a PR.

acolytec3 · Answer 9 · Wed Jun 01 2022 03:54:34 GMT+0800 (China Standard Time)

I think we can safely close this issue at this point with #180 and #182 merged. Thanks for the support on this!

Cayman · Answer 10 · Wed Jun 01 2022 11:04:46 GMT+0800 (China Standard Time)

No probem, thanks for the PRs ❤️