Propagate large sample batches with uTP
nulltea opened this issue · comments
Summary
- Currently, samples that are sent over Discv5 transport protocol are batched by 28 keys, while there's no waiting mechanism that will prevent work from happening before all sub-batches arrive. This is due to the UDP packet size limit of 1280 bytes (see Discv5 spec)
- The aim of this issue is to overcome this by introducing uTP
- See https://github.com/ethereum/portal-network-specs/blob/master/discv5-utp.md
Issues with uTP:
- Rust implementation of Discv5 can only handle 1 talk_req per remote node at a time, which disallows to use of uTP as a part of the request communication. This is due to the handshake performed by uTP. We can go around this immediately finishing the request and then seeing acknowledgment after work is finished
- multiple packet recv polls work at the same time, one in handle Data other in overlay_service
- bug with using
wrapping_sub
for sequence number compression here breaks duplicate handling, should besaturating_sub
instead - Syn requests timeout too quickly, resulting resend that deadlocks the data packet receiving flow for other connections,
- however, it isn't timeouts that are the root problem, but the lack of concurrency in request handling, this problem will only grow bigger with an increase in the number of validators, samples, and redundancy
Our use of uTP for sending large amounts of keys:
sequenceDiagram
Originator->>Forwarder: ConnectionId
Forwarder->>Originator: Promise
Note right of Originator: Start listening for specific uTP connection from Originator
Forwarder->>Originator: uTP ST_SYN
Originator->>Forwarder: uTP ST_STATE
Originator->>Forwarder: uTP ST_DATA
Originator->>Forwarder: ...
Forwarder->>Originator: ...
Note left of Forwarder: Once DATA send & acknowledged
Originator->>Forwarder: uTP ST_FIN
Associated commit: nulltea/discv5-overlay@230208e
Associated PR: ethereum/trin#481
Benchmarks (256 keys, 800 nodes, no redundancy):
- with Discv5
TALKREQ
: time.busy:895µs time.idle:2.61ms- communication overhead = 544 messages
- storage overhead = 256 keys