Random sampling

Question

Random sampling

nulltea opened this issue 2 years ago · comments

timofey commented 2 years ago

Setup

Starts from where #7 finishes (ie. samples had been dispersed bucket-wise with certain redundancy settings)
Use $T$ existing or new peers that randomly choose $k$ keys from samples id-space

Settings

$T$ (default = $N/2$ where $N$ is the number of peers in the network
$k$ (default = 75)

Measurements

Number of successful sample queries (must be more than 75% of the id-space)

Notes

Here we can introduce KZG and Reed-Solomon math (as libraries), but this shouldn't take too much time, ie. this isn't the main priority, only a way to get more practical insights. Generaly, we should be okay with simple mock values (as long as the number of successful sample queries is more than 75% of the id-space)

timofey · Answer 1 · Wed Nov 09 2022 17:38:04 GMT+0800 (China Standard Time)

Problem: discv5 send to socket event loop is sequential (ie. send_to_socket call is blocking) - this causes requests to cranch and subsequently timeout errors.
Solution: add --parallelism to limit the number of simultaneous requests in a waiting queue.

timofey · Answer 2 · Wed Nov 09 2022 17:46:09 GMT+0800 (China Standard Time)

Benchmarks (256 keys, 800 nodes, validators=1, samples=75):

with routing=bucket-wise, redundancy=F1/R1: time.busy:10.9ms time.idle:32.9s
- communication overhead = 1644 messages
with routing=bucket-wise, redundancy=F1/RS(1): time.busy:11.6ms time.idle:25.3s
- communication overhead = 1290 messages
with routing=distance-wise, redundancy=F1/R1: time.busy:11.8ms time.idle:31.3s
- communication overhead = 1706 messages

timofey · Answer 3 · Wed Nov 09 2022 18:41:58 GMT+0800 (China Standard Time)

Problem: Discv5 has no support for FIND_VALUE RPC. It's possible to iterate over FIND_NODE results and send TALK_REQ to each found ENR, yet this doubles the number of requests and isn't parallelizable.
Solution: modify Discv5 to support FIND_VALUE such that on every RPC query call, a remote node can either answer with VALUE response if it has a value for a requested key or with NODES response if doesn't. See commit that implements that.