w3f / messaging

Messaging for Web3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification regarding network topology

oskarth opened this issue · comments

According to Impact of Network Topology on Anonymity and Overhead in Low-Latency Anonymity Networks (https://www.esat.kuleuven.be/cosic/publications/article-1230.pdf) it seems like a stratified topology is desirable. This is also what Loopix has chosen, from what I can tell.

What implications does this have for the underlaying network topology? Assuming it is a p2p overlay this isn't clear to me.

In the design document a brief rationale should be provided for why Stratified Topology.

Assuming a stratified network is desirable:

1. Are all participants equal in capabilities?

E.g. is a user also a mix node by default?
In Loopix overview it appears there are three separate actors: sender/receiver, service provider, and mixnet nodes. IMO this is a requirement ('8. No Specialized Services (pure p2p)'), even if some nodes choose to operate in a light-node fashion. This does not preclude things like (ad hoc, decentralized) bootstrap/tracker lists of useful nodes.

(If this is absolutely impossible, and a set of different actors is absolutely necessary, we should provide motivation and clear roles and incentives for these, similar to Ethereum's validators, etc. This ensures it doesn't turn the network into a set of middleman (providers, ISPs, etc).)

2. How are mixnodes placed in their respective layers and how is this guarantees?

E.g. one could imagine basic pubkey hashing into buckets (coordination-less), but a set of malicious nodes could perhaps 'overload' a layer?

3. Is one mixnode always in that layer, or can it act in multiple layers at once?

If a sender is also a node,

4. How are the number of layers set and enforced?

One could imagine some actors wanting more layers, but it isn't clear what this permissionless action would have for other nodes. Would it just be badly behaved, and layerN=3 would be a requirement to participate in network?

Related: Karaoke https://pdos.csail.mit.edu/papers/karaoke.pdf which uses more layers, and also comments somewhat negatively on Loopix (unclear to me how true its points are).

5. Assuming there's a stratified network, what implications, if any, does this have for the underlaying P2P overlay?

In terms of peer discovery, resource consumption, efficiency etc.

Yes, stratified is normally considered preferable. If I recall, the actual anonymity improvement looks small, but nodes talking with fewer other nodes has other practical benefits.

I'm okay with starting with a free route design, but we should attempt to keep our options open to go with a stratified design.

  1. Are all participants equal in capabilities?
    E.g. is a user also a mix node by default?

I think not because so many users have so little bandwidth or hardware, and so many router nodes have so much. Tor abandoned this early because slow nodes are impact performance so much. Tor even discourages users from using low quality nodes now. I2P still claims this, but they never really analyze their network. It's possible our added latency could make this more acceptable than for Tor or I2P, not sure our PKI could handle it, or if it'd impact security issues connected with the PKI. Assume no for now.

I think one trickier question is: Can routing nodes act like users? At some level obviously yes, but if we have a stratified topology then doing so hurts their anonymity.

IMO this is a requirement ('8. No Specialized Services (pure p2p)')

Yes, avoiding providers is a priority. We can have nodes playing special roles, like you say, but these push up developer time. We might want entry nodes to be more trusted, if that can even be assessed.

  1. How are mixnodes placed in their respective layers and how is this guarantees?
    E.g. one could imagine basic pubkey hashing into buckets (coordination-less), but a set of malicious nodes could perhaps 'overload' a layer?

If all nodes are identical, and you have enough nodes, then you could allocate them by H(node_identity_key ++ r) % layers where r is an unbiased collaborative random number determined after all node_identity_keys were committed. All users can compute this. Also, nodes cannot choose their strata except by influencing the collaborative random number.

In fact, we cannot assume all nodes have identical capacity, but assessing that capacity requires some effort. We can maybe estimate capacity using the "secret shopper" scheme, provided nodes are payed in a currency that expires or decays, but maybe transitions into another spendable currency on expiration.

If we have a Tor-like global network consensus, like say accounts containing currency balances in this expiring currency, then we want to sample a uniformly random solution to the k-partition problem weighted by capacity: We first obtain an unbiased collaborative random number from our proof-of-stake blockchain, with which we shall seed all other randomness. :) We next obtain an incorrect solution by by first randomly permuting the node and then treating them as occupying their capacity many adjacent slots, which yields nodes straddling multiple strata. We next allocate all nodes that completely cover one strata wholly to that strata and randomly assign nodes partially in two strata to one or their other, according to their weighting. We finally approximate a fix to the partition by randomly swapping nodes into lower capacity strata.

In fact, there is usually a desire to force "related" nodes into the same strata, but if this actually improves anonymity remains debatable because malicious nodes declare no relationships. It might benefit node operators though by making them less valuable targets. In Polkadot, we have another unsolved problem in nominated proof-of-stake (NPoS) of allocating nominators to validators so as to maximize the minimum total stake nominating each validator, which sounds dangerously similar to maximizing the minimum total node capacity allocated to each layer with family constraints being analogous to validator nomination preferences.

Another trickier question around stratification is: How many nodes want to profit from storage? We may not want all aggregation points in the same layer.

  1. Is one mixnode always in that layer, or can it act in multiple layers at once?

I'd ask Claudia Diaz or @FatemeShirazi if this causes any real problems, but obviously individual packets take differnetn odes in each layer.

If a sender is also a node,
4. How are the number of layers set and enforced?

You need free routes if users act as nodes. I2P assigns it locally, but maybe that makes things worse.

Related: Karaoke https://pdos.csail.mit.edu/papers/karaoke.pdf which uses more layers, and also comments somewhat negatively on Loopix (unclear to me how true its points are).

I'll read this later today, thanks. As a rule MIT CSAIL worries primarily about producing analyzable designs and not so much about producing practical or flexible ones.

  1. Assuming there's a stratified network, what implications, if any, does this have for the underlaying P2P overlay?

It likely improves performance, but knowing or managing the stratification may require more resources.

There are interesting ideas in the Karaoke paper, including their optimistic indistinguishability technique, but especially verifying cover packets by bloom filter #10 (comment) As expected, the design makes unrealistic assumptions in their analysis: Rounds. All users online all the time. etc.

I'd felt their criticism of Loopix sounds fair: Yes, Loopix has service providers, which breaks strong privacy assurances. Yes, if adversarial node are distributed evenly among the strata with proportion f, and we have k hops, then you choose all adversarial hops with rate (1-1/f)^k.

I agree that 3 is very low. In Karaoke they use 14 layers and assume 80 % of relays are honest. A longer route is more secure for sure, however, it increases latency. So, determining the route length should depend on our assumptions about the adversary, the security guarantees we want, and the latency penalty we are willing to pay for it.

If we are going with a stratified topology, one mix (or mixes belonging to the same organization) should not be active in multiple layers. Otherwise this would increase their chances to control the whole route.

In the aggregation point design, there are three sequences with possibly different lengths: sender -> contact point -> aggregation point -> receiver. We'd expect the first two happens in rapid sequence, unless the receiver stays offline for an extended period. I doubt the second requires so many hops because it only protects the aggregation points identity from the senders and contact points, while the first and third actually protect users directly.