Dissemination based on secondary routing table

Question

Dissemination based on secondary routing table

nulltea opened this issue 2 years ago · comments

Summary

Implement samples dissemination using a secondary routing table (SRT)
Preferably this should be a part of the custom discv5 overlay (see #14)
It isn't yet clear what optimization can be made possible with SRT, but it is intuitive to see it as a way to spare requests by caching the location of DAS nodes and sample keys based on XOR-distance
The data structure could either be a KBucketsTable, or Protos DAS-tree, or something else.
Additional task 1: consider using uTP transport to send packets larger then plain UPD supports. It would be nice to compare that to current wire protocols (discv5, libp2p).
Additional task 2: consider using neighborhood gossip to disperse samples bundles in short proximity of forwarders node_id, ie. instead of continuing split-and sending sample just flood the closest nodes with the whole bundle. It would be nice to see the impact of it on bucket-wise dissemination, but the implementation of it isn't strictly necessary.

timofey · Answer 1 · Tue Nov 15 2022 21:24:02 GMT+0800 (China Standard Time)

Additional task 3: check the consistency of the sample keys used in dissemination and sampling. Sampling with modified discv5 takes too long, maybe the problem is that keys aren't mapped equally which makes lookup deterministic and take longer. Note, lookup has so far always finished with full success, so there might not be any problem of such.

timofey · Answer 2 · Tue Nov 15 2022 23:28:31 GMT+0800 (China Standard Time)

Plan: @ec2 is doing dissemination based on Protos DAS-tree, @timoth-y investigates ways to extend KBucketTable with optimisations for dissemination.

timofey · Answer 3 · Wed Nov 16 2022 00:52:11 GMT+0800 (China Standard Time)

A few thoughts about dissemination:

our current dissemination strategies are recursive (originator sends samples to the closest peers it knows, forwarders repeat the same thing until individual sample reach the closest known peers). Such kind of strategies will use routing table only to get a list of nodes to whom talk_request samples to, thus no need to have tight integration with overlay protocol.
an alternative strategy is an iterative dissemination, originator asks the closest peers it knows whether they know closer ones, if they don’t originator sends them samples to store, otherwise originator keeps asking closer peers it receives until it reaches the closest known ones). This strategy might require tighter integration with overlay protocol, it will resemble FIND_NODE behaviour.
another thing is that originator (who is expected to always be builder node) could have a much larger routing table, possibly even containing close to all known nodes. This could allow it to calculate which node should store which sample and thus come up with the most efficient path. This modification can optimize both recursive and iterative dissemination strategies, and is independent of overlay protocol

timofey · Answer 4 · Tue Nov 29 2022 19:59:12 GMT+0800 (China Standard Time)

Tree routing from protolambda/dv5das is based on binary prefix tree search.

Here is a brief guide how to implement it to our codebase:

Add new batching strategy to enum: BatchingStrategy::PrefixWise which splits keys by first P bytes, see dv5das#sampling-1
Ensure that DasTree is a prefix tree and init it in setTopology method similarly how it’s being updated on Discv5Event::SessionEstablished event
Add search_by_prefix method to DasTree similar to this tries-prefix-trees#prefix-search
On receiving the samples bundle apply search_by_prefix for each key and use k results (node_ids) to propagate samples further, at the result we should get HashMap<NodeId, Vec<SampleKey>> similar to how it’s done here
Add PeerSorting enum with values (Random, ByScore) and use it to sort search_by_prefix results before selecting k of them
Add prefix-based lookup for sampling, see dv5das#retrieving-samples, TBD