Support for ChaCha20 testvectors?

Question

Support for ChaCha20 testvectors?

woodruffw opened this issue a year ago · comments

Hi there!

I'm opening this issue to see if there's appetite for supporting ChaCha20 test vectors (i.e. just the block cipher, not a composed AEAD).

As justification:

ChaCha20 is specified in RFC 7539 with a 96/32 nonce/counter split;
ChaCha20 is also originally specified here, and doesn't directly specify a nonce/counter split; a common split is 64/64 nonce/counter.

The variance between the two has produced some discrepancies between major implementations of ChaCha20. From a survey:

OpenSSL takes a 32-bit counter as input, but rolls over to the next 32 bits (in the nonce space) on counter overflow. Source.
LibreSSL takes a 64-bit counter as input, and increments the high 32 bits on overflow of the lower bits. Source
BoringSSL takes a 32-bit counter as input, and only uses those 32-bits. On overflow the counter wraps around. Source

Of the three, BoringSSL's implementation is the only one that strictly follows the RFC.

Naively, this should only cause issues at the the 2^32 block boundary, assuming that the counter starts at 0 or 1. However, starting at 0 or 1 is not standardized, and some ChaCha20 APIs assume that the entire 16-byte nonce+counter input space is randomly initialized instead. This means that the initial counter value may be significantly closer to the rollover point than 2^32 blocks, causing the different implementations to diverge after fewer enciphered blocks than might otherwise be expected.

CC @davidben, who I've been nagging about this 🙂

David Benjamin · Answer 1 · Thu Jun 01 2023 02:24:17 GMT+0800 (China Standard Time)

I think the right way to think about this is that IETF ChaCha20 and 64/64 ChaCha20 are related but different primitives. It is unfortunate that they have the same name, but ideally the world would settle on the IETF one as that's the point of standards.

OpenSSL takes a 32-bit counter as input, but rolls over to the next 32 bits (in the nonce space) on counter overflow. Source.

Where are you getting the 32-bit input? That doesn't seem right. OpenSSL seems just believe in the 64/64 split. It takes as input a combined 128-bit nonce+counter (CHACHA_CTR_SIZE is 16 bytes) and, when incrementing, increments the bottom 64 bits.

That's perfectly coherent. It's just not the same primitive as the IETF 96/32 version. Though their documentation references the IETF version, so there's a documentation error there. I'll file a bug about that.
https://www.openssl.org/docs/manmaster/man3/EVP_chacha20_poly1305.html

David Benjamin · Answer 2 · Thu Jun 01 2023 02:35:50 GMT+0800 (China Standard Time)

Also keep in mind that wanting to start the block counter at a high value like you're suggesting doesn't make much sense. So while they are formally different primitives, if your application especially cares about the difference, it's probably doing something wrong, or at least very very obscure[*]. :-)

[*] QUIC packet number protection does pass in an arbitrary high value, but that's because it doesn't want ChaCha20 the stream cipher. It wants ChaCha20 the block function. There's no actual incrementing going on.

William Woodruff · Answer 3 · Thu Jun 01 2023 02:49:09 GMT+0800 (China Standard Time)

Where are you getting the 32-bit input? That doesn't seem right. OpenSSL seems just believe in the 64/64 split. It takes as input a combined 128-bit nonce+counter (CHACHA_CTR_SIZE is 16 bytes) and, when incrementing, increments the bottom 64 bits.

Yeah, I misspoke there -- OpenSSL does the same 64/64 split as LibreSSL, but with a unified "iv" input rather than two inputs.

Also keep in mind that wanting to start the block counter at a high value like you're suggesting doesn't make much sense. So while they are formally different primitives, if your application especially cares about the difference, it's probably doing something wrong, or at least very very obscure

Agreed -- I think the only reason the PyCA Cryptography APIs can start at a high value is because they currently encourage the user to pass in an 16 bytes of randomness, rather than 12 (and initializing the counter internally). I can't think of any reason why it needs to be that way though.

David Benjamin · Answer 4 · Thu Jun 01 2023 03:08:28 GMT+0800 (China Standard Time)

There can be a reason to specify the counter if you're trying to reserve some counter values for miscellaneous things (as the AEAD does), or resuming a stream across two calls. But those won't get you large values out of thin air.

David Benjamin · Answer 5 · Fri Jun 02 2023 03:13:38 GMT+0800 (China Standard Time)

Arguably the spec doesn't actually say you're supposed to wraparound. Though it mostly doesn't say anything either way:

ChaCha20 successively calls the ChaCha20 block function, with the
same key and nonce, and with successively increasing block counter
parameters. ChaCha20 then serializes the resulting state by writing
the numbers in little-endian order, creating a keystream block.

https://www.rfc-editor.org/rfc/rfc7539.html#section-2.4

Wrapping isn't "successively increasing". But it also doesn't say what to do if you can't increase. Then we have...

key_stream = chacha20_block(key, counter+j, nonce)

https://www.rfc-editor.org/rfc/rfc7539.html#section-2.4.1

Who knows what the normative semantics of that pseudocode is. :-) Though elsewhere we have:

Note: This section and a few others contain pseudocode for the
algorithm explained in a previous section. Every effort was made for
the pseudocode to accurately reflect the algorithm as described in
the preceding section. If a conflict is still present, the textual
explanation and the test vectors are normative.

So I guess the normative text is just "successively increasing". Yeesh.

William Woodruff · Answer 6 · Fri Jun 02 2023 04:18:56 GMT+0800 (China Standard Time)

Yeah, the only thing I see in the spec that implies wraparound is about the round structure:

Note: "addition" in the above paragraph is done modulo 2^32. In some
machine languages, this is called carryless addition on a 32-bit
word.

https://www.rfc-editor.org/rfc/rfc7539.html#section-2.3

But that doesn't say anything about wraparound in the block counter itself.

I'm not too familiar with the processes here -- do you think this is worthy of an errata? The RFC is not prescriptive about the counter's initial value (it suggests 0 or 1, but only as a suggestion), so emphasizing that the counter's increment is defined modulo 2^32 would eliminate at least one point of ambiguity.

David Benjamin · Answer 7 · Sat Jun 03 2023 01:09:54 GMT+0800 (China Standard Time)

The RFC is not prescriptive about the counter's initial value (it suggests 0 or 1, but only as a suggestion), so emphasizing that the counter's increment is defined modulo 2^32 would eliminate at least one point of ambiguity.

I think the RFC is actually decently clear about the initial value. It says:

o A 32-bit initial counter. This can be set to any number, but will
usually be zero or one. It makes sense to use one if we use the
zero block for something else, such as generating a one-time
authenticator key as part of an AEAD algorithm.

It could be a bit more prescriptive, but it ultimately can't actually prescribe zero or one. Five would be perfectly reasonable if you need to reserve blocks 0-4 for something. You could even split the counter space in half by the high bit if you really needed to, though I imagine you'd mostly want to use the nonce for that.

Given the mess here, I think it's pretty clear that, whether the primitive is defined to wrap or not, you really should not rely on it. So, one way or another, the spec shouldn't emphasize a modulo increment and instead should emphasize that you should avoid wrapping.

Really the problem is ChaCha20 is a low-level building block that you use to build actual primitives like AEADs. It's not something you should use if you aren't willing to think about your nonce and counter space and whether you're trying to partition it. We definitely should nail down the semantics, but ultimately it sounds like pyca/cryptography has exposed it wrong.

William Woodruff · Answer 8 · Wed Oct 25 2023 00:08:23 GMT+0800 (China Standard Time)

Revising this: given that the IETF and "original"/SSH variants are both widely implemented, my current thinking is that it probably makes sense to have separate Wycheproof vectors for both.

Does that seem reasonable?

bleichenbacher-daniel · Answer 9 · Thu Dec 21 2023 19:25:42 GMT+0800 (China Standard Time)

I looked into SSH recently. Since the maximal amount of data that can be sent before rekeying is 1 or 4 GB a counter will never overflow a 32 bit boundary. Hence implementations can use a standard implementation without problem.

There may be some value in generating test vectors for SSH itself. I have some initial code for testing certificates. However, I can only work in my free time hence I can not make any promises. I'm no longer working at Google and don't have access to documents and generation code anymore. This means that a lot of my free time goes into reimplementing stuff from scratch.