ChaCha20-Poly1305 large test vectors

Question

ChaCha20-Poly1305 large test vectors

macaba opened this issue 5 years ago · comments

Performance optimized implementations of ChaCha20-Poly1305 tend to use conditional areas of code that use CPU intrinsics to operate on larger ciphertext streams.
(SSSE3 intrinsics for 256 byte blocks, AVX2 intrinsics for 512 byte blocks)

It would be good to see some large test vectors that will target this, at least one in the >=256 n <512 range, and at least one in the >=512 range. This will demonstrate any implementation bugs in these optimized paths.

Thai Duong · Answer 1 · Tue Nov 26 2019 09:32:33 GMT+0800 (China Standard Time)

You mean test vectors containing plaintext longer than 256 or 512 bytes?

It sounds like a good idea.

@bleichen what do you think?

Daniel Bleichenbacher · Answer 2 · Fri Nov 29 2019 22:30:22 GMT+0800 (China Standard Time)

First, I'm a bit surprised that using 512 byte blocks with AVX2 intrinsics would give optimal performance. I wouldn't expect larger than 256 byte blocks here.

Adding a few longer test vectors can be done. Though the question that remains is whether this
is effective. One likely source for problem is the poly1305 computation, which can easily suffer
from overflow problems if parallelized carelessly. To generate test vectors that check the poly1305
computation for overflows, I generated a large number of keys and selected those keys where the
poly1305 subkeys were extreme. This assumes that Horner's method is used. Hence it is likely that overflows in a parallel poly1305 implementation would currently remain undetected.
I'd guess that test vectors for poly1305 alone, with extreme sub keys could help.

Another source for errors are incremental updates. E.g. this paper has some results:
https://eprint.iacr.org/2017/891.pdf
Flaws can occur at just a few specific input sizes. Not sure what specific input sizes
are most problematic for parallelized implementations.

macaba · Answer 3 · Fri Nov 29 2019 23:44:56 GMT+0800 (China Standard Time)

I meant to say ChaCha20 rather than ChaCha20-Poly1305, so that might have confused the issue.

See here for an implementation that has an optimized path for >512 bytes:

https://github.com/jedisct1/libsodium/blob/dcc2e06c93067f421ab549550b89fec45993b7a7/src/libsodium/crypto_stream/chacha20/dolbeau/u8.h#L129

I hope, on that basis alone, the case for larger test vectors is self explanatory?

I found I had to create larger test vectors in order to catch AVX2 implementation bugs in my own implementation. I have no experience in what makes a good test vector hence why I opened this issue.

Daniel Bleichenbacher · Answer 4 · Sat Nov 30 2019 02:16:43 GMT+0800 (China Standard Time)

On Fri, Nov 29, 2019 at 4:44 PM Mark ***@***.***> wrote: I meant to say ChaCha20 rather than ChaCha20-Poly1305, so that might have confused the issue. See here for an implementation that has an optimized path for >512 bytes: https://github.com/jedisct1/libsodium/blob/dcc2e06c93067f421ab549550b89fec45993b7a7/src/libsodium/crypto_stream/chacha20/dolbeau/u8.h#L129 I hope, on that basis alone, the case for larger test vectors is self explanatory? I found I had to create larger test vectors in order to catch AVX2 implementation bugs in my own implementation. I have no experience in what makes a *good* test vector hence why I opened this issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#73>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGGH7XWAR7ZLYPBEMV5W4NTQWE2HRANCNFSM4IZN2PVA> .

Thanks. This implementation does indeed use 512 byte blocks. I would have expected that register spills are a big enough problem, so that smaller chunks are preferable. I'll add some longer inputs. Though I still think that test vectors are not the best option to detect problems with large inputs. E.g. comparing against a reference implementation allows to cover more ground, including things like plaintext

…

2**32 blocks, for ciphers that allow this.