C2SP / wycheproof

Project Wycheproof tests crypto libraries against known attacks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ChaCha20-Poly1305 large test vectors

macaba opened this issue · comments

Performance optimized implementations of ChaCha20-Poly1305 tend to use conditional areas of code that use CPU intrinsics to operate on larger ciphertext streams.
(SSSE3 intrinsics for 256 byte blocks, AVX2 intrinsics for 512 byte blocks)

It would be good to see some large test vectors that will target this, at least one in the >=256 n <512 range, and at least one in the >=512 range. This will demonstrate any implementation bugs in these optimized paths.

You mean test vectors containing plaintext longer than 256 or 512 bytes?

It sounds like a good idea.

@bleichen what do you think?

First, I'm a bit surprised that using 512 byte blocks with AVX2 intrinsics would give optimal performance. I wouldn't expect larger than 256 byte blocks here.

Adding a few longer test vectors can be done. Though the question that remains is whether this
is effective. One likely source for problem is the poly1305 computation, which can easily suffer
from overflow problems if parallelized carelessly. To generate test vectors that check the poly1305
computation for overflows, I generated a large number of keys and selected those keys where the
poly1305 subkeys were extreme. This assumes that Horner's method is used. Hence it is likely that overflows in a parallel poly1305 implementation would currently remain undetected.
I'd guess that test vectors for poly1305 alone, with extreme sub keys could help.

Another source for errors are incremental updates. E.g. this paper has some results:
https://eprint.iacr.org/2017/891.pdf
Flaws can occur at just a few specific input sizes. Not sure what specific input sizes
are most problematic for parallelized implementations.

I meant to say ChaCha20 rather than ChaCha20-Poly1305, so that might have confused the issue.

See here for an implementation that has an optimized path for >512 bytes:

https://github.com/jedisct1/libsodium/blob/dcc2e06c93067f421ab549550b89fec45993b7a7/src/libsodium/crypto_stream/chacha20/dolbeau/u8.h#L129

I hope, on that basis alone, the case for larger test vectors is self explanatory?

I found I had to create larger test vectors in order to catch AVX2 implementation bugs in my own implementation. I have no experience in what makes a good test vector hence why I opened this issue.