Dissemination based on secondary routing table
nulltea opened this issue · comments
Summary
- Implement samples dissemination using a secondary routing table (SRT)
- Preferably this should be a part of the custom discv5 overlay (see #14)
- It isn't yet clear what optimization can be made possible with SRT, but it is intuitive to see it as a way to spare requests by caching the location of DAS nodes and sample keys based on XOR-distance
- The data structure could either be a
KBucketsTable
, or Protos DAS-tree, or something else. - Additional task 1: consider using uTP transport to send packets larger then plain UPD supports. It would be nice to compare that to current wire protocols (discv5, libp2p).
- Additional task 2: consider using neighborhood gossip to disperse samples bundles in short proximity of forwarders node_id, ie. instead of continuing split-and sending sample just flood the closest nodes with the whole bundle. It would be nice to see the impact of it on bucket-wise dissemination, but the implementation of it isn't strictly necessary.
Additional task 3: check the consistency of the sample keys used in dissemination and sampling. Sampling with modified discv5 takes too long, maybe the problem is that keys aren't mapped equally which makes lookup deterministic and take longer. Note, lookup has so far always finished with full success, so there might not be any problem of such.
Plan: @ec2 is doing dissemination based on Protos DAS-tree, @timoth-y investigates ways to extend KBucketTable
with optimisations for dissemination.
A few thoughts about dissemination:
- our current dissemination strategies are recursive (originator sends samples to the closest peers it knows, forwarders repeat the same thing until individual sample reach the closest known peers). Such kind of strategies will use routing table only to get a list of nodes to whom talk_request samples to, thus no need to have tight integration with overlay protocol.
- an alternative strategy is an iterative dissemination, originator asks the closest peers it knows whether they know closer ones, if they don’t originator sends them samples to store, otherwise originator keeps asking closer peers it receives until it reaches the closest known ones). This strategy might require tighter integration with overlay protocol, it will resemble
FIND_NODE
behaviour. - another thing is that originator (who is expected to always be builder node) could have a much larger routing table, possibly even containing close to all known nodes. This could allow it to calculate which node should store which sample and thus come up with the most efficient path. This modification can optimize both recursive and iterative dissemination strategies, and is independent of overlay protocol
Tree routing from protolambda/dv5das is based on binary prefix tree search.
Here is a brief guide how to implement it to our codebase:
- Add new batching strategy to enum:
BatchingStrategy::PrefixWise
which splits keys by firstP
bytes, see dv5das#sampling-1 - Ensure that DasTree is a prefix tree and init it in
setTopology
method similarly how it’s being updated onDiscv5Event::SessionEstablished
event - Add
search_by_prefix
method to DasTree similar to this tries-prefix-trees#prefix-search - On receiving the samples bundle apply
search_by_prefix
for each key and usek
results (node_ids) to propagate samples further, at the result we should getHashMap<NodeId, Vec<SampleKey>>
similar to how it’s done here - Add
PeerSorting
enum with values (Random
,ByScore
) and use it to sortsearch_by_prefix
results before selectingk
of them - Add prefix-based lookup for sampling, see dv5das#retrieving-samples, TBD