espadrine / shishua

SHISHUA – The fastest PRNG in the world

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AVX512 implementation

Wunkolo opened this issue · comments

commented

Interested in hearing what you think about taking PRs relating to an AVX512 implementation? AVX512 is certainly not very popular at the moment(Only in 14nm Skylake/Cannonlake HEDT, more recent Xeon server chips, and 10nm Icelake Laptops) though an implementation can be verified with something like intel-sde. After seeing your benchmarks I am interested in seeing your numbers possibly double yet again when you have massive 512-bit vector-registers at your disposal to enhance your throughput.

I am willing to benchmark on my available AVX512 machines as well.

I would be very interested in that! My machine sadly doesn’t support it. I believe the GCP machines (using make benchmark-intel) support it.

To make it easy to use, we can probably detect support, so people using shishua.h automatically get the advantages on machines that support it.

Since AVX512 is not yet widespread, would it be possible to separate the benchmark results? Perhaps by having a preprocessor flag we can set to deactivate it even on machines that support it.

Also, to ensure we are fair with ChaCha8, could you also add support in chacha8.h?

commented

To make it easy to use, we can probably detect support, so people using shishua.h automatically get the advantages on machines that support it.

Since AVX512 is not yet widespread, would it be possible to separate the benchmark results? Perhaps by having a preprocessor flag we can set to deactivate it even on machines that support it.

Many compilers already exposes preprocessor definitions like __AVX512F__ and __AVX512VBMI__ to detect if the compiler is emitting assembly for an architecture that supports AVX512 so it will already lend itself to such a thing!

GCP also certainly has instances with AVX512 support.

I have two AVX512 machines at my disposal(Skylake-X and Icelake) to benchmark though I may not be able to begin to look at doing such an implementation for a little while but I do want to keep this issue open for any one else that wants to look into and to keep track and possibly partake in such progress!

@espadrine at alibaba cloud, you can borrow avx512 (Intel Xeon(Cascade Lake) Platinum 8269CY) machines paying by usage hours.