quicwg / load-balancers

In-progress version of draft-ietf-quic-load-balancers

Home Page:https://quicwg.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shorten nonce length for SCID

opened this issue · comments

Way back in #33 , @huitema said that his three-pass algorithm meant that

...it should also be possible to make the nonce bit shorter than the original spec, because we are no more relying on the randomness of the nonce value. Each server could set the nonce to some kind of sequence number, incremented at each CID allocation. The size of the number should be enough to cover all allocations during a CID encryption key epoch, instead of twice that to cover the birthday paradox if using random allocations.

But we never really did that: the nonce is in the range 8..16 octets, which is inherited from the original single-pass algorithm.

@ianswett asked if we could shorten the minimum nonce to 32 bits to reduce overall CID length; if it were, we could support 2^24 servers with an 8-byte CID!

This is compelling and we should definitely shorten the nonce to some value. But 32 bits means you need to roll over the keys for every 4 billion CIDs a server issues; given that some QUIC implementations are quite profligate issuing CIDs, is that enough?

Even another byte would get us to a trillion CIDs while supporting up to 2^16 servers in 8 octets. On the other hand, the config agent can always pick a longer nonce length if the key lifetime is too short. This is how much we want to allow people to do something potentially dumb, so it's a good spot for discussion.

I think some flexibility and discussion of the pros and cons here would be very useful. I think 32 bits is a reasonable min for our use cases, but maybe there's a reason that would cause issues I'm not aware of?

We are not profligate issuers of connection IDs and I think we just issue one extra one at any given point in time.

@mdukef5 wrote: "This is compelling and we should definitely shorten the nonce to some value. But 32 bits means you need to roll over the keys for every 4 billion CIDs a server issues; given that some QUIC implementations are quite profligate issuing CIDs, is that enough?"

Short answer: yes. Assume a server handles 2,048 connections per second, and allocates 16 CID per connection. That leaves 17 bits, 128K connections, 64K seconds, about 18 hours.

Long answer: to make that really work, you need a good rollover mechanism. Something like, if the load balancer observes that one server has rolled over, it triggers rollover for all other servers in the pool.

Most connections need 1 or 2 CIDs, so I think the practical bounds are higher.

Also, 2048 connections isn't that much, but 2048 per second is quite a bit based on my experience.

So I wonder if there's an attack that continually changes the DCID it uses, which will cause some servers to issue way more CIDs than usual, thus forcing a rollover inside the normal key management cycle.

Two countermeasures: servers SHOULD limit CID issuance to a sensible limit, and (as @huitema says) do a good rollover mechanism.

@huitema on a separate note, are there security implications for (nonce_len < sid_len)? I can't see any, but you would know better than I.

Perhaps I'm over-thinking this, because I'm not sure nonce re-use actually matters here. Assuming that a single server isn't using multiple server IDs, we are essentially encrypting the same plaintext over and over again. So even if the nonce is re-used, that is only amounting to the exact same CID being issued again (modulo random bits in the server use field or where the length would go). While this isn't great, as long as it's not done in the same connection, it doesn't provide any useful information to the observer to correlate clients.

If the observer is positioned to see all CIDs in use for the server pool, then over long time scales it may observer that two CIDs match, which for QUIC-LB means they were issued by the same server. Thanks to the config rotation bits, it can likely infer that the keys haven't rotated. But that has very little to do with correlating clients, which is the whole point of the document.

A server that rolls over its connection IDs is going to have to generate new Stateless Reset keys, but this is a totally local operation.

Am I not thinking about this correctly?

Encryption performs a bijection from the space of tuples <sid_len, nonce> to the space of numbers of size nb_sid*nb_nonce.
I don't see a particular risk in having nonce_len smaller or larger than sid_len.

There is a risk in densely using the space of encrypted numbers. look at the decryption size. If an adversary picks a number at random, that number will decrypt to <sid=s, nonce=n>. The shorter the ID, the denser your utilization of the space, the larger the likelihood that the randomly picked "s" will correspond to a valid server, and thus the adversary's message hitting a server instead of being dropped at the LB.

Same goes for the nonce, the adversary has more chances to get lucky if the space of nonces is densely used. But then, decryption will fail and the message will be ignored.

So the main issue I see is that randomly creating SID might yield slightly more effective resource-consumption attacks.

The attacker will not be able to directly correlate cid values and servers. Changing one bit in the nonce statistically changes half the bits in the encrypted sid+nonce.

Thanks, can we change to a 32 bit nonce and add some more text about the risks and trade-offs of shorter vs longer nonces? I feel someone else might be better at the risks/trade-offs bit, but I'll write a PR to change the value.

On the issue of nonce reuse: I toyed with idea of being permissive about this, but for the record here's why there's a problem:

At the extreme, say a server uses an independent nonce counter for each of its connections, so the encrypted bits follow a predictable pattern for CIDs 1..n issued over the life of the connection.

First, if the plaintext bits (first octet and server-use) are the same, we can't demultiplex connections, and everyone will have everyone else's stateless reset tokens. So at a minimum, the server has to keep the overall CID unique.

But even it's unique, if the encrypted bits follow this pattern, then a client connected to that server can identify other clients linked to that server.