AVX512 implementation

Question

AVX512 implementation

Wunkolo opened this issue 4 years ago · comments

Interested in hearing what you think about taking PRs relating to an AVX512 implementation? AVX512 is certainly not very popular at the moment(Only in 14nm Skylake/Cannonlake HEDT, more recent Xeon server chips, and 10nm Icelake Laptops) though an implementation can be verified with something like intel-sde. After seeing your benchmarks I am interested in seeing your numbers possibly double yet again when you have massive 512-bit vector-registers at your disposal to enhance your throughput.

I am willing to benchmark on my available AVX512 machines as well.

Thaddée Tyl · Answer 1 · Sun Apr 19 2020 14:46:18 GMT+0800 (China Standard Time)

I would be very interested in that! My machine sadly doesn’t support it. I believe the GCP machines (using make benchmark-intel) support it.

To make it easy to use, we can probably detect support, so people using shishua.h automatically get the advantages on machines that support it.

Since AVX512 is not yet widespread, would it be possible to separate the benchmark results? Perhaps by having a preprocessor flag we can set to deactivate it even on machines that support it.

Also, to ensure we are fair with ChaCha8, could you also add support in chacha8.h?

Wunk · Answer 2 · Sun Apr 19 2020 14:53:59 GMT+0800 (China Standard Time)

To make it easy to use, we can probably detect support, so people using shishua.h automatically get the advantages on machines that support it.

Since AVX512 is not yet widespread, would it be possible to separate the benchmark results? Perhaps by having a preprocessor flag we can set to deactivate it even on machines that support it.

Many compilers already exposes preprocessor definitions like __AVX512F__ and __AVX512VBMI__ to detect if the compiler is emitting assembly for an architecture that supports AVX512 so it will already lend itself to such a thing!

GCP also certainly has instances with AVX512 support.

I have two AVX512 machines at my disposal(Skylake-X and Icelake) to benchmark though I may not be able to begin to look at doing such an implementation for a little while but I do want to keep this issue open for any one else that wants to look into and to keep track and possibly partake in such progress!

James Z.M. Gao · Answer 3 · Wed Apr 22 2020 09:40:56 GMT+0800 (China Standard Time)

@espadrine at alibaba cloud, you can borrow avx512 (Intel Xeon(Cascade Lake) Platinum 8269CY) machines paying by usage hours.