ChainSafe / discv5

A Typescript implementation of the Discv5 protocol

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Establishing a session with nodes behind NAT/Firewall

acolytec3 opened this issue · comments

In my work on Ultralight, I've observed that our nodes running behind firewall or NAT (e.g. either on an Android device or else running on my local laptop) often times are not able to establish connections with our public Ultralight bootnodes with sessions getting dropped during the handshake process. As best I can tell, the root cause is often that the node behind the firewall is reporting a local IP address in its ENR so the other node is dropping the session during the handshake process at this point due to this verifyEnr call.

I've interacted with @fjl on this some and it seems that this may be against the discv5 spec so wanted to understand if I'm misconfiguring something in Ultralight when instantiating the discv5 client or else what the thought process behind this is as it is something we're having to hack around in order to allow Ultralight nodes to join a public network. At a minimum, would you be open to a PR that allows for something like a session with an "untrusted" status? This would allow Ultralight nodes who don't know their public UDP endpoint address to connect to bootnodes, exchange discv5 PING/PONG, and then be able to addrVotes to update their ENR to the correct public UDP endpoint reported by the bootnode.

I think the verifyEnr check should be removed.

As a follow on, my fork now removes this check and things seem to work okay. It's not perfect yet and I've not dealt with the probably separate issue of whether or not an ENR with an inconsistent UDP endpoint should be added to the routing table or not (seems like it shouldn't if we follow what other implementations have done -- i.e. establish sessions only and then only respond to inbound requests on the session and not send outbound requests). Then, if/when the remote node updates its ENR to reflect the observed UDP endpoint, a new session can be established and the node added to the routing table.

@acolytec3 @fjl would you mind checking what other discv5 implementations do regarding this ENR verification (i.e. Go, Rust, Nim)?

I've been interacting with the Nim implementers on it and my understanding is that nim-discv5 approaches as I outlined above (will establish sessions with nodes with inconsistent ENRs but not add to the routing table and then will only respond to inbound requests from that node but not initiate outbound requests) though I believe Rust is a little stricter. I'll check with the Trin team and find out.

@fjl can you comment on the Go implementation?

Yes, the go-ethereum implementation of discv5 does permit nodes with incorrect IP/port listed in ENR to establish a session. In go-ethereum such nodes may even be added to the routing table. However, since table entries must be reachable at the address listed in ENR for liveness verification, the nodes will be removed soon after.

It looks like rust-discv5 will only establish sessions with nodes that have either an ENR that matches the observed IP or an ENR with no location keys set, so it's somewhere in the middle of what I proposed. I've made a couple of additional changes in my fork and verified that it works well in local testing for ultralight nodes. The basic approach I'm using (which follows nim-discv5 is:
When an inbound session is being negotiated, if the UDP endpoint in the ENR provided by the remote node doesn't match the observed IP, a session will be established but the established event is not emitted which ensures that this node does not get added to the discv5 routing table. As such, the client still can respond to inbound requests but does not propagate an ENR with invalid location details to the network and if/when the remote node updates its external IP, a new session can be negotiated based on the updated contact info.

I think it might be worth further discussion about what is allowed in the sendRPCRequest field (currently either ENR or multiaddr). Within my portalnetwork code, in order to properly respond to an inbound request from a node with no entry in the routing table, I'm having to create a bogus ENR and then override the private _nodeId field in order to pass a valid input to the sendTalkResp method (which passes on to the sendRPCRequest). In the circumstances where we're responding to a node with either no ENR location fields or something invalid, we only have the nodeId propagated from the TalkReq message header and don't have the pubkey so I can't construct a multiaddr with the correct node ID encoding. It might be nice to add a third option that corresponds to the session key (observed IP address + nodeId) for this scenario, (provided you want to move forward with a PR along these lines).

@acolytec3 I'm fine with an approach that allows Portal Network usecases as long as it's safe enough against DOS attacks, i.e. the happy medium. If you could upstream your fork's code that would be great! We have limited bandwidth right now

@acolytec3 I'm fine with an approach that allows Portal Network usecases as long as it's safe enough against DOS attacks, i.e. the happy medium. If you could upstream your fork's code that would be great! We have limited bandwidth right now

Okay! I'm going to make a few more tweaks and then we'll test some with Ultralight nodes on the fork to see if we uncover any issues before I submit a PR.

I think we can safely close this issue at this point with #180 and #182 merged. Thanks for the support on this!

No probem, thanks for the PRs ❤️