Investigate simple performance improvement for Apple M1
mortendahl opened this issue · comments
Morten Dahl commented
AES seems to be slower than ChaCha on Apple M1:
rng_fill/chacha8/2000000
time: [1.6929 ms 1.6940 ms 1.6951 ms]
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
rng_fill/chacha12/2000000
time: [2.4556 ms 2.4581 ms 2.4606 ms]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
rng_fill/chacha20/2000000
time: [3.9821 ms 3.9857 ms 3.9895 ms]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
rng_fill/aes/2000000 time: [8.4624 ms 8.4707 ms 8.4792 ms]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
rng_next_u64/chacha8 time: [8.0137 us 8.0212 us 8.0285 us]
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe
rng_next_u64/chacha12 time: [11.055 us 11.065 us 11.076 us]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
rng_next_u64/chacha20 time: [17.142 us 17.161 us 17.179 us]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severe
rng_next_u64/aes time: [36.918 us 36.950 us 36.983 us]
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
Maybe this can be fixed by simply enabling a flag.
Morten Dahl commented
Note that (see https://docs.rs/aes/0.8.1/aes/#configuration-flags)
$ RUSTFLAGS="--cfg aes_armv8" cargo +nightly bench
gives much better results:
rng_fill/chacha8/2000000
time: [1.5236 ms 1.5249 ms 1.5263 ms]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
rng_fill/chacha12/2000000
time: [2.2282 ms 2.2300 ms 2.2320 ms]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
rng_fill/chacha20/2000000
time: [3.6580 ms 3.6617 ms 3.6656 ms]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
rng_fill/aes/2000000 time: [225.40 us 225.76 us 226.14 us]
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
rng_next_u64/chacha8 time: [6.2238 us 6.2301 us 6.2366 us]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild
rng_next_u64/chacha12 time: [9.0791 us 9.0910 us 9.1064 us]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
rng_next_u64/chacha20 time: [14.792 us 14.805 us 14.818 us]
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
rng_next_u64/aes time: [1.9284 us 1.9315 us 1.9346 us]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild